Multitask Symbolic Regression in Genetic Programming

Monteiro, Adriana Alexandra Lopes

http://hdl.handle.net/10362/179151

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TCDMAA3440.pdf		12.19 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Monteiro, Adriana Alexandra Lopes

Orientador(es)

Vanneschi, Leonardo

Resumo(s)

Multitask Learning (MTL) is the process in which multiple problems are solved simultaneously, usually with shared representations or parameters. This approach is known to improve generalisation, performance and training time. MTL in Genetic Programming (GP) has received limited attention, due to the assumption that GP tailors solutions to a specific task and thus, that it is challenging to share these amongst different problems. This work introduces a novel method that evolves a tree that is shared between two tasks (also called problems), in which its terminals are pairs of the original features of the datasets. The fitness function of the common evolution is the average performance on each task, as an attempt to balance the improvement of both. The impact of different metrics for the common fitness function was studied, in specific the difference between Root Mean Squared Error (RMSE) and Relative Squared Error (RSE). The results revealed that the RSE is a more adequate measure than the other, as it achieves a better balance between the minimisation of error across both tasks. The effectiveness of the Common Tree (CT) in reducing overfitting and program size was verified for some cases, but it is not evident under what conditions this effect is predictable. In addition, across all datasets, the common evolution produces trees that for the same level of fitness, are smaller in size, indicating that it is preventing bloat. Other conclusions relate to the nature of the relationship between tasks. It was verified that the domain of problems is not sufficient to justify if a joint evolution will be successful. Additionally, some problems were verified to be better pairs than others and that their effectiveness is not symmetric. The CT is not expected to capture the complexity of each problem, leading to poorer results when compared with Standard Genetic Programming (StdGP). This was verified for all cases with the exception of the Istanbul dataset. To tackle this issue, the CT is transformed into a new feature that is later inserted into the dataset of each problem. A final evolutionary process is conducted to leverage the information from both evolutions. The final results yield no difference when compared with StdGP.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science

Palavras-chave

Genetic Programming Symbolic Regression Multitask Learning

URI

http://hdl.handle.net/10362/179151

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Licença CC

cclicense-by

Ver registo completo