Sem título

Financiador

Organização

Publicações

Publication . Alexandre, Leonardo; Costa, R. S.; Henriques, Rui; LAQV@REQUIMTE; DQ - Departamento de Química; PLOS - Public Library of Science

Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omics data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to evaluate patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. Results from four case studies confirm the validity and relevance of the proposed methods, further unveiling critical directions for research on biotechnology and biomedicine. Availability: DISA is freely available at https://github.com/JupitersMight/DISA under the MIT license.

2022-10-19Artigo científico

Acesso aberto

Ver mais

DI2

Publication . Alexandre, Leonardo; Costa, Rafael S.; Henriques, Rui; LAQV@REQUIMTE; DQ - Departamento de Química; BioMed Central (BMC)

Background: A considerable number of data mining approaches for biomedical data analysis, including state-of-the-art associative models, require a form of data discretization. Although diverse discretization approaches have been proposed, they generally work under a strict set of statistical assumptions which are arguably insufficient to handle the diversity and heterogeneity of clinical and molecular variables within a given dataset. In addition, although an increasing number of symbolic approaches in bioinformatics are able to assign multiple items to values occurring near discretization boundaries for superior robustness, there are no reference principles on how to perform multi-item discretizations. Results: In this study, an unsupervised discretization method, DI2, for variables with arbitrarily skewed distributions is proposed. Statistical tests applied to assess differences in performance confirm that DI2 generally outperforms well-established discretizations methods with statistical significance. Within classification tasks, DI2 displays either competitive or superior levels of predictive accuracy, particularly delineate for classifiers able to accommodate border values. Conclusions: This work proposes a new unsupervised method for data discretization, DI2, that takes into account the underlying data regularities, the presence of outlier values disrupting expected regularities, as well as the relevance of border values. DI2 is available at https://github.com/JupitersMight/DI2

2021-12Artigo científico

Acesso aberto

Ver mais

Entidade financiadora

Fundação para a Ciência e a Tecnologia

Programa de financiamento

3599-PPCDT

Número da atribuição

DSAIPA/DS/0111/2018

Sem título

Financiador

Autores

Publicações

Unidades organizacionais

Descrição

Palavras-chave

Contribuidores

Financiadores

Entidade financiadora

Programa de financiamento

Número da atribuição

ID