Pinheiro, Flávio Luís PortasLopes, Fábio Rafael Santos2024-03-192024-03-192024-02-05http://hdl.handle.net/10362/165104Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThis thesis presents an in-depth exploration of the Lakehouse Data Architecture. This paradigm merges the strengths of data lakes and data warehouses, enabling organizations to harness the full potential of their data. The research investigates the architectural components, operational mechanisms, and strategic implications of implementing a Lakehouse within an organization using advanced technologies like Microsoft Azure, Google Cloud Platform, Databricks, Apache Spark, Delta Lake, and Dremio. The study also scrutinizes the Lakehouse's ability to facilitate a data-centric culture by integrating advanced analytics into business processes. The thesis further delves into the FAIR data principles, advocating for data to be Findable, Accessible, Interoperable, and Reusable, and the Data Mesh concept, a decentralized data management approach. The research concludes that the Lakehouse architecture provides a comprehensive and robust framework for managing vast and diverse data sets, optimizing data pipeline performance, reducing redundancy, and enhancing data security. It underscores the pivotal role of the Lakehouse in driving strategic innovation and positions it as a flexible and adaptable model for future technological advancements in AI and machine learning. The insights offered in this thesis serve as a guide for organizations aiming to navigate the complexities of becoming data-centric and underscore the transformative power of modern data platforms.engData MeshDatabricksData LakehouseDelta LakeMicrosoft AzureApache SparkSDG 9 - Industry, innovation and infrastructureLakehouse Data Architecture: Data as a first-class citizen within an organizationmaster thesis203553047