| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.88 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
The quality of models produced via supervised machine learning depends on both
the learning algorithm used and the training data available to learn from. The work
presented in this paper focuses on optimizing training data directly and compares
different methods for generating synthetic training data while holding the learning
algorithm constant. In this paper, the author proposes a new algorithm that leverages
genetic programming to create a diverse population of data generating programs
which are sampled from to create training data for the given task. This is applied
within the context of building a robust text recognition model that can be integrated
into a broader document processing software solution that supports multiple domains.
Descrição
Dissertation presented as partial requirement for obtaining the Master’s degree in Data Science and
Advanced Analytics
Palavras-chave
Deep Learning Novelty Search Genetic Programming Synthetic Data Algorithms Computer Vision Document Processing Document Understanding
