Logo do repositório
 
Publicação

Lakehouse Data Architecture: Data as a first-class citizen within an organization

datacite.subject.fosCiências Naturais::Ciências da Computação e da Informaçãopt_PT
dc.contributor.advisorPinheiro, Flávio Luís Portas
dc.contributor.authorLopes, Fábio Rafael Santos
dc.date.accessioned2024-03-19T11:24:33Z
dc.date.available2024-03-19T11:24:33Z
dc.date.issued2024-02-05
dc.descriptionInternship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Sciencept_PT
dc.description.abstractThis thesis presents an in-depth exploration of the Lakehouse Data Architecture. This paradigm merges the strengths of data lakes and data warehouses, enabling organizations to harness the full potential of their data. The research investigates the architectural components, operational mechanisms, and strategic implications of implementing a Lakehouse within an organization using advanced technologies like Microsoft Azure, Google Cloud Platform, Databricks, Apache Spark, Delta Lake, and Dremio. The study also scrutinizes the Lakehouse's ability to facilitate a data-centric culture by integrating advanced analytics into business processes. The thesis further delves into the FAIR data principles, advocating for data to be Findable, Accessible, Interoperable, and Reusable, and the Data Mesh concept, a decentralized data management approach. The research concludes that the Lakehouse architecture provides a comprehensive and robust framework for managing vast and diverse data sets, optimizing data pipeline performance, reducing redundancy, and enhancing data security. It underscores the pivotal role of the Lakehouse in driving strategic innovation and positions it as a flexible and adaptable model for future technological advancements in AI and machine learning. The insights offered in this thesis serve as a guide for organizations aiming to navigate the complexities of becoming data-centric and underscore the transformative power of modern data platforms.pt_PT
dc.identifier.tid203553047pt_PT
dc.identifier.urihttp://hdl.handle.net/10362/165104
dc.language.isoengpt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectData Meshpt_PT
dc.subjectDatabrickspt_PT
dc.subjectData Lakehousept_PT
dc.subjectDelta Lakept_PT
dc.subjectMicrosoft Azurept_PT
dc.subjectApache Sparkpt_PT
dc.subjectSDG 9 - Industry, innovation and infrastructurept_PT
dc.titleLakehouse Data Architecture: Data as a first-class citizen within an organizationpt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT
thesis.degree.nameMestrado em Ciência de Dados e Métodos Analíticos Avançados, especialização em Ciência de Dadospt_PT

Ficheiros