dbTEST: A Database of Transcripts Expressed in Spermatogenesis and Testis. R.G. Halgren, M.R. Fielden, and T.R. Zacharewski

dbTEST (http://www.bch.msu.edu/~zacharet/dbtest/) represents a database of all genes currently known to be expressed in the murine testis, and will used to construct high density cDNA arrays for toxicogenomic analyses of endocrine disruption in the mouse. The NCBI UniGene, Jackson Laboratories Mouse Gene Expression (GXD), and National Library of Medicine (MEDLINE) databases were searched to identify genes expressed in mouse testicular tissue. By combining these publicly available data, it was possible to create a more comprehensive database than by using any of these sources exclusively. As of build 73 of the UniGene database, 4877 genes are represented in dbTEST. These represent both known genes and genes with significant homology to known genes from other species. 4170 (85.5%) were obtained from the UniGene database as clusters containing at least one EST sequenced in a library derived from murine testis. Analysis of the GXD and MEDLINE for genes expressed in the testis or component cell types identified an additional 335 (8.03%) and 268 (6.42%) genes, respectively. In addition to identifying genes experimentally determined to be expressed in the murine testis, the search of the literature also identified genes known to be involved in testis function in humans or rats, resulting in the selection of a significant number of homologous genes in the murine databases that would not have otherwise been included in dbTEST. Of the 268 genes identified by the MEDLINE search, 151 (56%) were derived from mouse, 98 (37%) were derived from rat, and the remaining nine (4%) were derived from human model systems. Gene expression analysis of 588 genes using a commercially available array identified 171 genes expressed at constitutively high levels in murine testis. Of these 171, 107 (63%) were already contained in dbTEST. The remaining 64 (37%) were added to the dbTEST, bringing the final total of known genes in the MTT to 4877. This approach represents an inexpensive, efficient, and productive method for creating comprehensive cell type, tissue, or organism specific transcriptomes, and takes full advantage of the exhaustive cDNA sequencing efforts of other researchers.