Logo do repositório
 
A carregar...
Miniatura
Publicação

Enhancing the Quality of ARGO's In Situ Measurements Data Using Machine Learning Algorithms

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
TGI4481.pdf1.7 MBAdobe PDF Ver/Abrir

Resumo(s)

The ARGO program is a global initiative that collects in situ oceanographic data through autonomous profiling floats, offering consistent and widespread measurements of temperature, salinity, and pressure across the world’s oceans. Despite its broad spatiotemporal coverage, the dataset frequently contains missing values due to sensor malfunctions, data transmission issues, or environmental limitations. These gaps can undermine the reliability of ocean analyses and climate modeling efforts. This thesis explores the use of Self-Organizing Maps (SOM), an unsupervised machine learning technique, for imputing missing values in the ARGO dataset. The methodology follows the CRISP-DM framework, encompassing data transformation, outlier removal, normalization, and simulation of missing values for evaluation. SOM was trained on complete observations using both spatial (latitude and longitude) and oceanographic features. For each incomplete test record, the Best Matching Unit (BMU) on the SOM grid was identified, and nearby neurons were queried to estimate missing values through a distance-weighted averaging strategy based on geographic proximity. Model performance was evaluated by comparing imputed values against artificially masked ground truth using standard metrics. Results show that SOM performed best on salinity (𝑅² = 0.54), with moderate accuracy on temperature (𝑅² = 0.32) and pressure (𝑅² = 0.30). While the SOM model did not outperform the K-Nearest Neighbors (KNN) baseline in terms of error metrics, it demonstrated value in preserving physical coherence and spatial structure. The study highlights SOM’s potential for integration into data quality pipelines, especially when interpretability and global pattern recognition are desired.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Business Intelligence

Palavras-chave

ARGO Self Organizing Maps Environment Data Imputation Machine Learning for Oceanography SDG 13 - Climate action SDG 14 - Life below water

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo