Assessment of Clone Identity and Sequence Fidelity Within a Commercially Distributed Subset of IMAGE Clones

R.G Halgren, M.R. Fielden, C.J. Fong, T.R. Zacharewski

Abstract

This report documents the error rate in a commercially distributed subset of the IMAGE Consortium mouse cDNA clone collection. After isolation of plasmid DNA from 1189 bacterial stock cultures, only 62.2 percent were uncontaminated and contained cDNA inserts that had significant sequence identity to published data for the ordered clones. An agarose gel electrophoresis prescreening strategy identified 361 stock cultures that appeared to contain two or more plasmid species. Isolation of individual colonies from these stocks demonstrated that 7.1 percent of the original 1189 stocks contained both a correct and an incorrect plasmid. 5.9 percent of the original 1189 stocks contained multiple, distinct, incorrect plasmids, indicating the likelihood of multiple contaminating events. While only 739 of the stocks purchased contained the desired cDNA clone, agarose gel prescreening, colony isolation, and similarity searching of dbEST allowed for the identification of an additional 420 clones that would have otherwise been discarded. Considering the high error rate in this subset of the IMAGE cDNA clone set, the use of sequence verified clones for cDNA microarray construction is warranted.

Supplementary Data

Description Notes Available Files
Summary of results by stock ordered This file presents the information for the clone order submitted to Research Genetics, including the IMAGE clone ID ordered, the library from which the clone is derived, location in IMAGE stock plates, and location on the stock plates we received. Also included is an analysis of which stocks were correct, incorrect, contaminated, or failed to grow. per_stock.xls
Sequence Verification Data Summary Excel file with four summary pages.
1 - Correct sequences from prescreened clones (primary)
2 - Incorrect sequences from prescreened clones with BLAST data for clone identification
3- Correct sequences from Isolated clones (secondary)
4- Incorrect sequences from isolated clones with BLAST data for clone identification
webdata_totals.xls
BLAST Data (Isolated Clones) The results of attempts to identify unknown sequences obtained during physical isolation of clones. Up to 10 significant hits per query (significance defined as E < 1e-50) are examined against the UniGene database to obtain a UniGene cluster ID. Note that not all ESTs are found in a UniGene cluster. blast_negatives1.htm
Comparison of Negative Clones A pairwise comparison of all clones derived from a single stock well using the BLAST 2 Sequences tool. Note that short matches in the 5' region of a sequence were not considered to be evidence of identity. This is probably cloning vector sequence. compare_negatives.htm