Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests

Verfasser / Beitragende:
[Mark R Segal, Jason D Barbour, Robert M Grant]
Ort, Verlag, Jahr:
2004
Enthalten in:
Statistical Applications in Genetics and Molecular Biology, 3/1(2004-02-12), 1-18
Format:
Artikel (online)
ID: 378926071
LEADER caa a22 4500
001 378926071
003 CHVBK
005 20180305123617.0
007 cr unu---uuuuu
008 161128e20040212xx s 000 0 eng
024 7 0 |a 10.2202/1544-6115.1031  |2 doi 
035 |a (NATIONALLICENCE)gruyter-10.2202/1544-6115.1031 
245 0 0 |a Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests  |h [Elektronische Daten]  |c [Mark R Segal, Jason D Barbour, Robert M Grant] 
520 3 |a The problem of relating genotype (as represented by amino acid sequence) to phenotypes is distinguished from standard regression problems by the nature of sequence data. Here we investigate an instance of such a problem where the phenotype of interest is HIV-1 replication capacity and contiguous segments of protease and reverse transcriptase sequence constitutes genotype. A variety of data analytic methods have been proposed in this context. Shortcomings of select techniques are contrasted with the advantages afforded by tree-structured methods. However, tree-structured methods, in turn, have been criticized on grounds of only enjoying modest predictive performance. A number of ensemble approaches (bagging, boosting, random forests) have recently emerged, devised to overcome this deficiency. We evaluate random forests as applied in this setting, and detail why prediction gains obtained in other situations are not realized. Other approaches including logic regression, support vector machines and neural networks are also applied. We interpret results in terms of HIV-1 reverse transcriptase structure and function. 
540 |a ©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston 
690 7 |a Multivariate Analysis  |2 nationallicence 
690 7 |a Protease  |2 nationallicence 
690 7 |a Random Forests  |2 nationallicence 
690 7 |a Reverse Transcriptase  |2 nationallicence 
690 7 |a Tree-Structured Methods  |2 nationallicence 
700 1 |a Segal  |D Mark R.  |u University of California, San Francisco  |4 aut 
700 1 |a Barbour  |D Jason D.  |u University of California, San Francisco  |4 aut 
700 1 |a Grant  |D Robert M.  |u University of California, San Francisco  |4 aut 
773 0 |t Statistical Applications in Genetics and Molecular Biology  |d De Gruyter  |g 3/1(2004-02-12), 1-18  |q 3:1<1  |1 2004  |2 3  |o sagmb 
856 4 0 |u https://doi.org/10.2202/1544-6115.1031  |q text/html  |z Onlinezugriff via DOI 
908 |D 1  |a research article  |2 jats 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.2202/1544-6115.1031  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Segal  |D Mark R.  |u University of California, San Francisco  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Barbour  |D Jason D.  |u University of California, San Francisco  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Grant  |D Robert M.  |u University of California, San Francisco  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Statistical Applications in Genetics and Molecular Biology  |d De Gruyter  |g 3/1(2004-02-12), 1-18  |q 3:1<1  |1 2004  |2 3  |o sagmb 
900 7 |b CC0  |u http://creativecommons.org/publicdomain/zero/1.0  |2 nationallicence 
898 |a BK010053  |b XK010053  |c XK010000 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-gruyter