| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.7 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
The ARGO program is a global initiative that collects in situ oceanographic data through
autonomous profiling floats, offering consistent and widespread measurements of
temperature, salinity, and pressure across the world’s oceans. Despite its broad
spatiotemporal coverage, the dataset frequently contains missing values due to sensor
malfunctions, data transmission issues, or environmental limitations. These gaps can
undermine the reliability of ocean analyses and climate modeling efforts. This thesis explores
the use of Self-Organizing Maps (SOM), an unsupervised machine learning technique, for
imputing missing values in the ARGO dataset. The methodology follows the CRISP-DM
framework, encompassing data transformation, outlier removal, normalization, and
simulation of missing values for evaluation. SOM was trained on complete observations using
both spatial (latitude and longitude) and oceanographic features. For each incomplete test
record, the Best Matching Unit (BMU) on the SOM grid was identified, and nearby neurons
were queried to estimate missing values through a distance-weighted averaging strategy
based on geographic proximity. Model performance was evaluated by comparing imputed
values against artificially masked ground truth using standard metrics. Results show that SOM
performed best on salinity (𝑅² = 0.54), with moderate accuracy on temperature (𝑅² = 0.32)
and pressure (𝑅² = 0.30). While the SOM model did not outperform the K-Nearest Neighbors
(KNN) baseline in terms of error metrics, it demonstrated value in preserving physical
coherence and spatial structure. The study highlights SOM’s potential for integration into data
quality pipelines, especially when interpretability and global pattern recognition are desired.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Business Intelligence
Palavras-chave
ARGO Self Organizing Maps Environment Data Imputation Machine Learning for Oceanography SDG 13 - Climate action SDG 14 - Life below water
