A carregar...
Projeto de investigação
A machine learning-based forecasting system for shellfish safety
Financiador
Autores
Publicações
Forecasting biotoxin contamination in mussels across production areas of the Portuguese coast with Artificial Neural Networks
Publication . Cruz, Rafaela C.; Costa, Pedro R.; Krippahl, Ludwig; Lopes, Marta B.; DI - Departamento de Informática; NOVALincs; CMA - Centro de Matemática e Aplicações; Elsevier Science B.V., Amsterdam.
Harmful algal blooms (HABs) and the consequent contamination of shellfish are complex processes depending on several biotic and abiotic variables, turning prediction of shellfish contamination into a challenging task. Not only the information of interest is dispersed among multiple sources, but also the complex temporal relationships between the time-series variables require advanced machine methods to model such relationships. In this study, multiple time-series variables measured in Portuguese shellfish production areas were used to forecast shellfish contamination by diarrhetic shellfish poisoning (DSP) toxins one to four weeks in advance. These time series included DSP concentration in mussels (Mytilus galloprovincialis), toxic phytoplankton cell counts, meteorological, and remotely sensed oceanographic variables. Several data pre-processing and feature engineering methods were tested, as well as multiple autoregressive and artificial neural network (ANN) models. The best results regarding the mean absolute error of prediction were obtained for a bivariate long short-term memory (LSTM) neural network based on biotoxin and toxic phytoplankton measurements, with higher accuracy for short-term forecasting horizons. When evaluating all ANNs model ability to predict the contamination state (below or above the regulatory limit for contamination) and changes to this state, multilayer perceptrons (MLP) and convolutional neural networks (CNN) yielded improved predictive performance on a case-by-case basis. These results show the possibility of extracting relevant information from time-series data from multiple sources which are predictive of DSP contamination in mussels, therefore placing ANNs as good candidate models to assist the production sector in anticipating harvesting interdictions and mitigating economic losses.
Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization
Publication . Peixoto, Carolina; Lopes, Marta B.; Martins, Marta; Casimiro, Sandra; Sobral, Daniel; Grosso, Ana Rita; Abreu, Catarina; Macedo, Daniela; Costa, Ana Lúcia; Pais, Helena; Alvim, Cecília; Mansinho, André; Filipe, Pedro; Costa, Pedro Marques da; Fernandes, Afonso; Borralho, Paula; Ferreira, Cristina; Malaquias, João; Quintela, António; Kaplan, Shannon; Golkaram, Mahdi; Salmans, Michael; Khan, Nafeesa; Vijayaraghavan, Raakhee; Zhang, Shile; Pawlowski, Traci; Godsey, Jim; So, Alex; Liu, Li; Costa, Luís; Vinga, Susana; NOVALincs; CMA - Centro de Matemática e Aplicações; UCIBIO - Applied Molecular Biosciences Unit; DCV - Departamento de Ciências da Vida; BioMed Central (BMC)
Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner—a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods’ accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models’ predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients’ groups based on RNA-seq data.
Time-Lagged Correlation Analysis of Shellfish Toxicity Reveals Predictive Links to Adjacent Areas, Species, and Environmental Conditions
Publication . Patrício, André; Lopes, Marta B.; Costa, Pedro Reis; Costa, Rafael S.; Henriques, Rui; Martins, Susana de Almeida Mendes Vinga; NOVALincs; CMA - Centro de Matemática e Aplicações; LAQV@REQUIMTE; DQ - Departamento de Química; MDPI - Multidisciplinary Digital Publishing Institute
Diarrhetic Shellfish Poisoning (DSP) is an acute intoxication caused by the consumption of contaminated shellfish, which is common in many regions of the world. To safeguard human health, most countries implement programs focused on the surveillance of toxic phytoplankton abundance and shellfish toxicity levels, an effort that can be complemented by a deeper understanding of the underlying phenomena. In this work, we identify patterns of seasonality in shellfish toxicity across the Portuguese coast and analyse time-lagged correlations between this toxicity and various potential risk factors. We extend the understanding of these relations through the introduction of temporal lags, allowing the analysis of time series at different points in time and the study of the predictive power of the tested variables. This study confirms previous findings about toxicity seasonality patterns on the Portuguese coast and provides further quantitative data about the relations between shellfish toxicity and geographical location, shellfish species, toxic phytoplankton abundances, and environmental conditions. Furthermore, multiple pairs of areas and shellfish species are identified as having correlations high enough to allow for a predictive analysis. These results represent the first step towards understanding the dynamics of DSP toxicity in Portuguese shellfish producing areas, such as temporal and spatial variability, and towards the development of a shellfish safety forecasting system.
Forecasting shellfish contamination by marine biotoxins based onmultivariate time series
Publication . Cruz, Rafaela Carreira Eleutério Gregório da; Lopes, Marta; Krippahl, Ludwig
Shellfish production has been growing in recent years, having a high impact in several Portuguese
coastal regions. However, this resource can be contaminated by marine biotoxins
produced by toxic phytoplankton. The consumption of contaminated shellfish can cause
serious health problems, and hence the harvesting and commercialisation of this product
are prohibited whenever biotoxin concentration exceeds the safety limits. Since this
prohibition leads to severe economic losses, it becomes necessary to develop strategies
that predict shellfish contamination. Biotoxin concentration in bivalve molluscs can be
predicted using univariate and multivariate time series, by modelling past information
to predict the future. These time series include historical data on in-situ measurements
of biotoxin concentration in several shellfish species, as well as other biological and meteorological
data. In this thesis, multiple time series were acquired from different sources,
integrated and pre-processed. Afterwards, various univariate andmultivariate time series
forecasting methods were developed to predict mussel contamination in multiple production
areas. In this context, autoregressive models and artificial neural networks (ANNs),
such as feed-forward, convolutional and long short-term memory (LSTM) networks, were
tested. Additionally, various data preparation and feature engineering methods were explored
to improve these models. The forecasting models were evaluated and compared in
order to determine which are the most suitable to solve the problem at hand. The results
showed that the ANNs, namely networks trained on data whose dimension had previously
been reduced using an autoencoder and networks trained on univariate time series,
outperformed the classic autoregressive models. Moreover, among the ANN models, the
LSTMs were very accurate, especially at one-week ahead predictions. Finally, the multivariate
models did not outperform the univariate models, which may be explained by the
fact that the additional variables used in this thesis did not provide relevant information
to forecast shellfish contamination. These results might be regarded as the first pivotal
steps towards the development of a model-based forecasting tool, which will allow the
production sector to anticipate the harvesting prohibition, enabling the development of
strategies to mitigate the economic losses inherent to this situation.
Unidades organizacionais
Descrição
Palavras-chave
Contribuidores
Financiadores
Entidade financiadora
Fundação para a Ciência e a Tecnologia
Programa de financiamento
3599-PPCDT
Número da atribuição
DSAIPA/DS/0026/2019
