| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 2.86 MB | Adobe PDF |
Orientador(es)
Resumo(s)
Self-Oranizing Maps are powerful models capable of mapping high-dimensional data into a
topology-preserving two-dimensional space. Their abilities have been commonly applied to
geodemographic tasks, of which the importance has been ever growing. Geodemographic
segmentations, in particular, have become increasingly popular in both the public and private
sectors. We tested the impact of the introduction of Self-Organizing Maps in the
geodemographic clustering process by comparing the performance of K-Means Clustering and
Hierarchical Clustering when trained on SOM weights to their performance when trained on
the input data directly. For this purpose, we used Censos 2021 data pertaining to the Lisbon
Metropolitan Area (LMA), divided by subsections according to the BGRI 2021. Various
combinations of preprocessing steps and SOM hyperparameters were tested, and it
culminated in a model trained with the following hyperparameter set: x = 45, y = 45, σ = 5,
learning rate = 0.3. This map was trained on 150000 iterations. Results showed that although
K-Means performed better than Hierarchical Clustering for this data, the introduction of the
SOM in the process did not strongly impact performance, resulting in clusters of similar quality
when evaluated by means of Silhouette Score, Davies-Bouldin Index and Calinski-Harabasz
Index. After impact assessment, the a K-Means Clustering model trained on SOM weights was
used to produce a segmentation of the LMA subsections, and those segments were then
analyzed. The second goal of this work was to produce a SOM-based interactive visualization
tool that could be used to explore and analyze geodemographic clustering outputs. By using
the SOM grid and a geographic map as the main components, a dashboard composing of two
main tabs was created. The first tab focuses on cluster label distribution throughout the SOM
grid map and the geographic space, with the option to visualize the SOM’s distance map to
analyze its structure. The second tab dives into single-variable analysis, turning the SOM grid
into a component planes plot and the geographic map into a choropleth map. Across both
views, the many interactivity features are available, both through additional widgets or by
direct selection of plot elements. We believe this approach to visual interactivity and
exploration of SOM-based clustering outputs can bring value to a variety of contexts where
static visualization of results is the norm.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
Palavras-chave
Self-Organizing Maps SOM Geodemographics Geodemographic Clustering Census Clustering Visualization Dashboard SDG 11 - Sustainable cities and communities
