| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 4.74 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
O presente estudo teve como objetivo geral comprovar o valor que a reutilização de dados abertos
pode representar no impacto económico de uma empresa com loja online. A recolha de informação
(os dados relativos a produtos) em fontes abertas (em lojas online) constitui uma das potencialidades
para a exploração do mercado online.
Para isso foram abordadas questões inerentes ao desenvolvimento de criação de valor dos dados
abertos com a arquitetura e a implementação de uma plataforma de reutilização de dados, utilizando
exclusivamente informação pública das principais lojas online de retalho em Portugal. O resultado
traduzir-se-á numa plataforma que combina dados de diferentes retalhistas para uma exploração de
dados ampla, rica e precisa acerca do mercado retalhista online, em tempo real.
No que diz respeito à implementação da plataforma serão interpelados todos os passos necessários
da construção de uma ferramenta escalável e automatizada para um acesso e recolha mais fácil às
informações, conteúdos e produtos das lojas online (Websites), resultando em ganhos de eficiência no
que diz respeito a Data Analytics devido à utilização de dados em tempo real que permitem elaborar
análises avançadas e assim contribuir para um conhecimento mais profundo do mercado. Com foco
nos novos paradigmas da Data Science e na importância da inclusão de tecnologias que revelem uma
mais-valia quando aplicadas a desenvolvimentos de projetos nesta temática, a aplicação será
construída com base numa arquitetura Serverless na Cloud Amazon Web Services (AWS) utilizando as
técnicas de Web Scraping e Web Crawling para a extração dos dados, encontrando soluções de
resposta às diversas proteções dos Websites (lojas online).
Com foco no objetivo principal, depois da informação recolhida, transformada e armazenada, será
desenvolvida uma camada de análise aos dados, a fim de observar e medir a importância dos dados
no mercado de retalho online em Portugal.
Sintetizado, questões relacionadas com matéria de reutilização de dados abertos, técnicas de Web
Scraping e Web Crawling, soluções contra defesas que os portais online implementam, vantagens e
desafios na utilização de arquiteturas Serverless e construção de análises visando a criação de valor na
compreensão do negócio, serão conceitos abordados e discutidos com detalhe durante a presente
dissertação.
This study had the general objective of proving the value that the reuse of open data can represent in the economic impact of a company with an online store. The collection of information (product data) in open sources (in online stores) is one of the potentialities for exploring the online market. To this end, issues inherent to the development of open data value creation were addressed with the architecture and implementation of a platform for data reuse, using exclusively public information from the main online retail stores in Portugal. The result will be a platform that combines data from different retailers for a broad, rich, and accurate data exploration of the online retail market, in real time. Regarding the implementation of the platform, all necessary steps will be addressed to build a scalable and automated tool for easier access and collection of information, content, and products from online stores (Websites), resulting in efficiency gains regarding Data Analytics due to the use of real-time data that allows advanced analysis and thus contributes to a deeper understanding of the market. Focusing on the new paradigms of Data Science and the importance of including technologies that show added value when applied to project developments in this area, the application will be built based on a Serverless architecture in the Amazon Web Services (AWS) cloud using Web Scraping and Web Crawler techniques for data extraction, finding solutions to respond to the various protections of Websites (online stores). Focusing on the main goal, after the information collected, transformed, and stored, a layer of data analysis will be developed to observe and measure the importance of data in the online retail market in Portugal. In summary, issues related to open data reuse, Web Scraping and Web Crawler techniques, solutions against defenses that online portals implement, advantages and challenges in the use of Serverless architectures and the construction of analytics aiming to create value in business understanding, will be concepts addressed and discussed in detail during this dissertation.
This study had the general objective of proving the value that the reuse of open data can represent in the economic impact of a company with an online store. The collection of information (product data) in open sources (in online stores) is one of the potentialities for exploring the online market. To this end, issues inherent to the development of open data value creation were addressed with the architecture and implementation of a platform for data reuse, using exclusively public information from the main online retail stores in Portugal. The result will be a platform that combines data from different retailers for a broad, rich, and accurate data exploration of the online retail market, in real time. Regarding the implementation of the platform, all necessary steps will be addressed to build a scalable and automated tool for easier access and collection of information, content, and products from online stores (Websites), resulting in efficiency gains regarding Data Analytics due to the use of real-time data that allows advanced analysis and thus contributes to a deeper understanding of the market. Focusing on the new paradigms of Data Science and the importance of including technologies that show added value when applied to project developments in this area, the application will be built based on a Serverless architecture in the Amazon Web Services (AWS) cloud using Web Scraping and Web Crawler techniques for data extraction, finding solutions to respond to the various protections of Websites (online stores). Focusing on the main goal, after the information collected, transformed, and stored, a layer of data analysis will be developed to observe and measure the importance of data in the online retail market in Portugal. In summary, issues related to open data reuse, Web Scraping and Web Crawler techniques, solutions against defenses that online portals implement, advantages and challenges in the use of Serverless architectures and the construction of analytics aiming to create value in business understanding, will be concepts addressed and discussed in detail during this dissertation.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
Palavras-chave
Dados Abertos Análise de Dados Arquitetura Serverless Web Scraping Web Crawling E-commerce ETL Open Data Data Analysis Serverless Architecture Web Scraping Web Crawling SDG 8 - Decent work and economic growth SDG 8 - Trabalho decente e crescimento economico
