| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 2.1 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
In today times, corporations have gained a vast interest in data. More and more, companies realized that
the key to improving their efficiency and effectiveness and understanding their customers’ needs and
preferences better was reachable by mining data. However, as the amount of data grow, so must the
companies necessities for storage capacity and ensuring data quality for more accurate insights. As such,
new data storage methods must be considered, evolving from old ones, still keeping data integrity.
Migrating a company’s data from an old method like a Data Warehouse to a new one, Google Cloud
Platform is an elaborate task. Even more so when data quality needs to be assured and sensible data, like
Personal Identifiable Information, needs to be anonymized in a Cloud computing environment. To
ensure these points, profiling data, before or after it migrated, has a significant value by design a profile
for the data available in each data source (e.g., Databases, files, and others) based on statistics, metadata
information, and pattern rules. Thus, ensuring data quality is within reasonable standards through
statistics metrics, and all Personal Identifiable Information is identified and anonymized accordingly.
This work will reflect the required process of how profiling Data Warehouse data can improve data
quality to better migrate to the Cloud.
Descrição
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
Palavras-chave
Data Quality Data Profile Database Data Warehouse Cloud Data Migration Pandas Profiling Personal Identifiable Information
