Utilize este identificador para referenciar este registo: http://hdl.handle.net/10362/164621
Título: Overview of data preprocessing for machine learning applications in human microbiome research
Autor: Ibrahimi, Eliana
Lopes, Marta B.
Dhamo, Xhilda
Simeon, Andrea
Shigdel, Rajesh
Hron, Karel
Stres, Blaž
D’Elia, Domenica
Berland, Magali
Marcos-Zambrano, Laura Judith
Palavras-chave: compositionality
data preprocessing
human microbiome
machine learning
metagenomics data
normalization
Microbiology
Microbiology (medical)
Data: Out-2023
Resumo: Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.
Descrição: This article is based upon work from COST Action ML4Microbiome “Statistical and machine learning techniques in human microbiome studies” (CA18131), supported by COST (European Cooperation in Science and Technology), www.cost.eu .KH acknowledges support through the HiTEc Cost Action CA21163 and the project PID2021-123833OB-I00 provided by the Spanish Ministry of Science and Innovation (MCIN/AEI/10:13039/501100011033) and ERDF A way of making Europe. MB acknowledges support through the Metagenopolis grant ANR-11-DPBS-0001. LM-Z is supported by Juan de la Cierva Grant (IJC2019-042188-I) from the Spanish State Research Agency of the Spanish Ministerio de Ciencia e Innovación y Ministerio de Universidades. Publisher Copyright: Copyright © 2023 Ibrahimi, Lopes, Dhamo, Simeon, Shigdel, Hron, Stres, D’Elia, Berland and Marcos-Zambrano.
Peer review: yes
URI: http://hdl.handle.net/10362/164621
DOI: https://doi.org/10.3389/fmicb.2023.1250909
ISSN: 1664-302X
Aparece nas colecções:Home collection (FCT)

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Overview_of_data_preprocessing.pdf1,38 MBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.