Logo do repositório
 
A carregar...
Miniatura
Publicação

Machine learning approaches in microbiome research

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
Machine_learning_approaches.pdf4.09 MBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.

Descrição

Funding Information: We greatly thank Emmanuelle Le Chatelier and Pauline Barbet (Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France) for preparing the shotgun CRC benchmark dataset. We also thank Michelangelo Ceci (Department of Computer Science, University of Bari Aldo Moro, Bari, Italy) and Christian Jansen (Institute of Science and Technology, Austria) for their interim leadership of the Working Group 3 of the COST Action ML4Microbiome. Funding Information: The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was based upon work from COST Action ML4Microbiome “Statistical and machine learning techniques in human microbiome studies” (CA18131), supported by COST (European Cooperation in Science and Technology, www.cost.eu ). MB acknowledged support through the Metagenopolis grant ANR-11-DPBS-0001. ML acknowledged support by FCT - Fundação para a Ciência e a Tecnologia, I.P., with reference CEECINST/00042/2021. Publisher Copyright: Copyright © 2023 Papoutsoglou, Tarazona, Lopes, Klammsteiner, Ibrahimi, Eckenberger, Novielli, Tonda, Simeon, Shigdel, Béreux, Vitali, Tangaro, Lahti, Temko, Claesson and Berland.

Palavras-chave

AutoML colorectal cancer feature selection machine learning methods microbiome data analysis model selection predictive modeling preprocessing Microbiology Microbiology (medical) SDG 3 - Good Health and Well-being

Contexto Educativo

Citação

Unidades organizacionais

Fascículo

Editora

Licença CC

Métricas Alternativas