Statistical Analysis of Genomic Tag Data

Verfasser / Beitragende:
[Thomas L LaFramboise, D. Neil Hayes, Torstein Tengs]
Ort, Verlag, Jahr:
2004
Enthalten in:
Statistical Applications in Genetics and Molecular Biology, 3/1(2004-12-08), 1-22
Format:
Artikel (online)
ID: 378926012
LEADER caa a22 4500
001 378926012
003 CHVBK
005 20180305123617.0
007 cr unu---uuuuu
008 161128e20041208xx s 000 0 eng
024 7 0 |a 10.2202/1544-6115.1099  |2 doi 
035 |a (NATIONALLICENCE)gruyter-10.2202/1544-6115.1099 
245 0 0 |a Statistical Analysis of Genomic Tag Data  |h [Elektronische Daten]  |c [Thomas L LaFramboise, D. Neil Hayes, Torstein Tengs] 
520 3 |a We present a series of statistical solutions to challenges that commonly arise in the production and analysis of genomic tag libraries. Tag libraries are collections of fragments of DNA or RNA, with each unique fragment often present in millions or billions of copies. Inferences can be made from data obtained by sequencing a subset of the library. The statistical approaches outlined in this paper are divided into three parts. First, we demonstrate the application of classical capture-recapture theory to the question of library complexity, i.e. the number of unique fragments in the library. Simulation studies verify the accuracy, for sample sizes of magnitudes typical in genomic studies, of the formulas we use to make our estimates. Second, we present a straightforward statistical cost analysis of tag experiments designed to uncover either disease-causing pathogens or new genes. Third, we develop a hidden Markov model approach to karyotyping a sample using a tag library derived from the sample's genomic DNA. While the resolution of the approach depends upon the number of tags sequenced from the library, we show via simulation that copy number alterations can be reliably detected for lengths as small as 1 Mb, even when a moderate number of tags are sequenced. Simulations predict very good specificity as well. Finally, all three of our approaches are applied to data from real tag library experiments. The hidden Markov model results are in line with what was expected from simulation, and genomic alterations found by applying the method to a cancer cell line library are confirmed using PCR.The methods and data described in this paper are contained in an R package, tagAnalysis, freely available at http://meyerson.dfci.harvard.edu/~tl974/tagAnalysis. 
540 |a ©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston 
690 7 |a Disease Modeling  |2 nationallicence 
690 7 |a Epidemiology  |2 nationallicence 
690 7 |a Genetics  |2 nationallicence 
690 7 |a Health Services Research  |2 nationallicence 
690 7 |a Computation  |2 nationallicence 
690 7 |a Computational Biology/Bioinformatics  |2 nationallicence 
690 7 |a genomic tag libraries  |2 nationallicence 
690 7 |a capture-recapture  |2 nationallicence 
690 7 |a hidden Markov model  |2 nationallicence 
690 7 |a karyotyping  |2 nationallicence 
700 1 |a LaFramboise  |D Thomas L.  |u Dana-Farber Cancer Institute  |4 aut 
700 1 |a Hayes  |D D. Neil  |u University of North Carolina  |4 aut 
700 1 |a Tengs  |D Torstein  |u Broad Institute of Harvard and MIT  |4 aut 
773 0 |t Statistical Applications in Genetics and Molecular Biology  |d De Gruyter  |g 3/1(2004-12-08), 1-22  |q 3:1<1  |1 2004  |2 3  |o sagmb 
856 4 0 |u https://doi.org/10.2202/1544-6115.1099  |q text/html  |z Onlinezugriff via DOI 
908 |D 1  |a research article  |2 jats 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.2202/1544-6115.1099  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a LaFramboise  |D Thomas L.  |u Dana-Farber Cancer Institute  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Hayes  |D D. Neil  |u University of North Carolina  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Tengs  |D Torstein  |u Broad Institute of Harvard and MIT  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Statistical Applications in Genetics and Molecular Biology  |d De Gruyter  |g 3/1(2004-12-08), 1-22  |q 3:1<1  |1 2004  |2 3  |o sagmb 
900 7 |b CC0  |u http://creativecommons.org/publicdomain/zero/1.0  |2 nationallicence 
898 |a BK010053  |b XK010053  |c XK010000 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-gruyter