Logo do repositório
 
Publicação

Self-Organizing Maps in Geodemographic Analytics: A Clustering and Visualization Study Using Lisbon Censos Data

datacite.subject.fosCiências Naturais::Ciências da Computação e da Informaçãopt_PT
dc.contributor.advisorBação, Fernando José Ferreira Lucas
dc.contributor.authorFernandes, José Maria Guimarães Ramirez
dc.date.accessioned2025-11-14T15:43:35Z
dc.date.available2025-11-14T15:43:35Z
dc.date.issued2025-10-31
dc.descriptionDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analyticspt_PT
dc.description.abstractSelf-Oranizing Maps are powerful models capable of mapping high-dimensional data into a topology-preserving two-dimensional space. Their abilities have been commonly applied to geodemographic tasks, of which the importance has been ever growing. Geodemographic segmentations, in particular, have become increasingly popular in both the public and private sectors. We tested the impact of the introduction of Self-Organizing Maps in the geodemographic clustering process by comparing the performance of K-Means Clustering and Hierarchical Clustering when trained on SOM weights to their performance when trained on the input data directly. For this purpose, we used Censos 2021 data pertaining to the Lisbon Metropolitan Area (LMA), divided by subsections according to the BGRI 2021. Various combinations of preprocessing steps and SOM hyperparameters were tested, and it culminated in a model trained with the following hyperparameter set: x = 45, y = 45, σ = 5, learning rate = 0.3. This map was trained on 150000 iterations. Results showed that although K-Means performed better than Hierarchical Clustering for this data, the introduction of the SOM in the process did not strongly impact performance, resulting in clusters of similar quality when evaluated by means of Silhouette Score, Davies-Bouldin Index and Calinski-Harabasz Index. After impact assessment, the a K-Means Clustering model trained on SOM weights was used to produce a segmentation of the LMA subsections, and those segments were then analyzed. The second goal of this work was to produce a SOM-based interactive visualization tool that could be used to explore and analyze geodemographic clustering outputs. By using the SOM grid and a geographic map as the main components, a dashboard composing of two main tabs was created. The first tab focuses on cluster label distribution throughout the SOM grid map and the geographic space, with the option to visualize the SOM’s distance map to analyze its structure. The second tab dives into single-variable analysis, turning the SOM grid into a component planes plot and the geographic map into a choropleth map. Across both views, the many interactivity features are available, both through additional widgets or by direct selection of plot elements. We believe this approach to visual interactivity and exploration of SOM-based clustering outputs can bring value to a variety of contexts where static visualization of results is the norm.pt_PT
dc.identifier.tid204071321
dc.identifier.urihttp://hdl.handle.net/10362/190748
dc.language.isoengpt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectSelf-Organizing Mapspt_PT
dc.subjectSOMpt_PT
dc.subjectGeodemographicspt_PT
dc.subjectGeodemographic Clusteringpt_PT
dc.subjectCensuspt_PT
dc.subjectClusteringpt_PT
dc.subjectVisualizationpt_PT
dc.subjectDashboardpt_PT
dc.subjectSDG 11 - Sustainable cities and communitiespt_PT
dc.titleSelf-Organizing Maps in Geodemographic Analytics: A Clustering and Visualization Study Using Lisbon Censos Datapt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT
thesis.degree.nameMestrado em Ciência de Dados e Métodos Analíticos Avançados, especialização em Business Analyticspt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TCDMAA3327.pdf
Tamanho:
2.86 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
348 B
Formato:
Item-specific license agreed upon to submission
Descrição: