| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 12.19 MB | Adobe PDF |
Orientador(es)
Resumo(s)
Multitask Learning (MTL) is the process in which multiple problems are solved simultaneously, usually with shared representations or parameters. This approach is known to improve generalisation, performance and training time. MTL in Genetic Programming (GP) has received limited attention, due to the assumption that GP
tailors solutions to a specific task and thus, that it is challenging to share these amongst different problems. This work introduces a novel method that evolves a tree that is shared between two tasks (also called problems), in which its terminals are pairs of the original features of the datasets. The fitness function of the common evolution is the average performance on each task, as an attempt to balance the improvement of
both. The impact of different metrics for the common fitness function was studied, in specific the difference between Root Mean Squared Error (RMSE) and Relative Squared Error (RSE). The results revealed that the RSE is a more adequate measure than the other, as it achieves a better balance between the minimisation of error across both tasks. The effectiveness of the Common Tree (CT) in reducing overfitting and program
size was verified for some cases, but it is not evident under what conditions this effect is predictable. In addition, across all datasets, the common evolution produces trees that for the same level of fitness, are smaller in size, indicating that it is preventing bloat. Other conclusions relate to the nature of the relationship between tasks. It was verified that the domain of problems is not sufficient to justify if a joint evolution will
be successful. Additionally, some problems were verified to be better pairs than others and that their effectiveness is not symmetric. The CT is not expected to capture the complexity of each problem, leading to poorer results when compared with Standard Genetic Programming (StdGP). This was verified for all cases with the exception of the Istanbul dataset. To tackle this issue, the CT is transformed into a new feature
that is later inserted into the dataset of each problem. A final evolutionary process is conducted to leverage the information from both evolutions. The final results yield no difference when compared with StdGP.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
Palavras-chave
Genetic Programming Symbolic Regression Multitask Learning
