Logo do repositório
 
A carregar...
Miniatura
Publicação

Self-Organizing Maps in Geodemographic Analytics: A Clustering and Visualization Study Using Lisbon Censos Data

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
TCDMAA3327.pdf2.86 MBAdobe PDF Ver/Abrir

Resumo(s)

Self-Oranizing Maps are powerful models capable of mapping high-dimensional data into a topology-preserving two-dimensional space. Their abilities have been commonly applied to geodemographic tasks, of which the importance has been ever growing. Geodemographic segmentations, in particular, have become increasingly popular in both the public and private sectors. We tested the impact of the introduction of Self-Organizing Maps in the geodemographic clustering process by comparing the performance of K-Means Clustering and Hierarchical Clustering when trained on SOM weights to their performance when trained on the input data directly. For this purpose, we used Censos 2021 data pertaining to the Lisbon Metropolitan Area (LMA), divided by subsections according to the BGRI 2021. Various combinations of preprocessing steps and SOM hyperparameters were tested, and it culminated in a model trained with the following hyperparameter set: x = 45, y = 45, σ = 5, learning rate = 0.3. This map was trained on 150000 iterations. Results showed that although K-Means performed better than Hierarchical Clustering for this data, the introduction of the SOM in the process did not strongly impact performance, resulting in clusters of similar quality when evaluated by means of Silhouette Score, Davies-Bouldin Index and Calinski-Harabasz Index. After impact assessment, the a K-Means Clustering model trained on SOM weights was used to produce a segmentation of the LMA subsections, and those segments were then analyzed. The second goal of this work was to produce a SOM-based interactive visualization tool that could be used to explore and analyze geodemographic clustering outputs. By using the SOM grid and a geographic map as the main components, a dashboard composing of two main tabs was created. The first tab focuses on cluster label distribution throughout the SOM grid map and the geographic space, with the option to visualize the SOM’s distance map to analyze its structure. The second tab dives into single-variable analysis, turning the SOM grid into a component planes plot and the geographic map into a choropleth map. Across both views, the many interactivity features are available, both through additional widgets or by direct selection of plot elements. We believe this approach to visual interactivity and exploration of SOM-based clustering outputs can bring value to a variety of contexts where static visualization of results is the norm.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics

Palavras-chave

Self-Organizing Maps SOM Geodemographics Geodemographic Clustering Census Clustering Visualization Dashboard SDG 11 - Sustainable cities and communities

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo