Yu-Mei Chang
Feature Weighted Models (FWM) to address lineage dependency in drug-resistance prediction from Mycobacterium tuberculosis genome sequences
Chang, Yu-Mei; Xia, Dong; Billows, Nina
Authors
Dong Xia
Nina Billows
Abstract
Background: Tuberculosis is caused by members of the Mycobacterium tuberculosis complex (MTBC) and is the second leading infectious killer after COVID-19. The evolution of drug-resistance poses a threat to successful treatment and disease eradication. Whole genome sequencing combined with statistical and machine learning approaches is being increasingly adopted to predict drug-resistance and characterise underlying mutations. However, these approaches may not generalise well in clinical practice due to confounding from the clonal population structure of the MTBC.
Methods: To investigate how population structure affects machine learning prediction, we compare the performance between random forest (RF) models applied to a global dataset comprised of 18,396 isolates (lineages 1-7; “global”) and a subset containing isolates from two major lineages of the MTBC (lineages 2 and 4; n=10,464; “lineage-specific” (separate) or “combined”). To reduce lineage-dependency in the models we derived weights from a phylogenetic tree using Fitch’s parsimony which are used as a probability for splitting nodes in the RF. Performance of feature weighted RF models were compared to unweighted models and a traditional feature selection approach using area under the ROC curve (AUC-ROC), sensitivity, specificity and F1 score. The importance of features driving performance was measured by Gini importance and most frequent interactions in the model.
Results: All RF models achieved moderate-high performance (AUC-ROC range: 0.60-0.98). First-line drugs had higher performance than second-line drugs, but performance varied depending on the drug-resistant phenotype and lineages in the dataset. Lineage-specific models generally had higher sensitivity than global models which may be underpinned by strain specific drug-resistance mutations or sampling effects. Feature weighted RF models had comparable performance to the unweighted models and the application of feature weights and traditional feature selection approaches reduced lineage-dependency in the model.
Conclusion: We show that predictive performance differs between lineages and global predictions may not generalise well across all lineages. The application of feature weights mitigated confounding from population structure, but in some cases reduced the importance of strain specific drug-resistance mutations and increased confounding from co-occurring phenotypes. This signifies the importance of addressing confounding in machine learning prediction whilst considering the complex genetic interactions underlying drug-resistance in tuberculosis.
Citation
Chang, Y., Xia, D., & Billows, N. (2023). Feature Weighted Models (FWM) to address lineage dependency in drug-resistance prediction from Mycobacterium tuberculosis genome sequences. Bioinformatics, 39(7), https://doi.org/10.1093/bioinformatics/btad428
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 6, 2023 |
Publication Date | Jul 10, 2023 |
Deposit Date | Jul 6, 2023 |
Publicly Available Date | Jul 25, 2023 |
Print ISSN | 1367-4803 |
Electronic ISSN | 1460-2059 |
Publisher | Oxford University Press |
Peer Reviewed | Peer Reviewed |
Volume | 39 |
Issue | 7 |
DOI | https://doi.org/10.1093/bioinformatics/btad428 |
Publisher URL | https://academic.oup.com/bioinformatics/article/39/7/btad428/7222183 |
Files
Feature weighted models to address lineage dependency in drug-resistance prediction from Mycobacterium tuberculosis genome sequences
(5.3 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Version
VoR
You might also like
Downloadable Citations
About RVC Repository
Administrator e-mail: publicationsrepos@rvc.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search