Using Genetic Programming to Improve Data Collection for Offline Reinforcement Learning: An Evolutionary Approach for Data-Centric Reinforcement Learning

Halder, David Roman

http://hdl.handle.net/10362/174691

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TCDMAA3472.pdf		4.59 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Halder, David Roman

Orientador(es)

Bação, Fernando José Ferreira Lucas

Resumo(s)

Offline Reinforcement Learning (RL) learns policies solely from fixed pre-collected datasets, making it applicable to use-cases where data collection is expensive or risky. Consequently, the performance of these offline learners is highly dependent on the dataset used. Still the questions of how this data is collected and what dataset characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline RL algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and collection modes. The exploration and exploitation capabilities of these methods were quantified and the effects of Semantic Genetic Operators (GOs) and bloat control on these metrics were assessed. Lastly, a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline algorithms showed statistically significant performance gains with GP-collected data than trained on DQNcollected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP's properties with offline learners, suggesting a promising avenue for future research in optimizing data collection for RL.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
The dataset for this thesis can be accessed directly in GitHub through the following link: https://github.com/dropthedave/offlineRL_thesis A record of the dataset is also available on NOVA Research Portal: https://novaresearch.unl.pt/en/datasets/software-using-genetic-programming-to-improve-data-collection-for

Palavras-chave

Offline Reinforcement Learning Genetic Programming Evolutionary Reinforcement Learning Evolutionary Algorithms Data Efficiency SDG 9 - Industry, innovation and infrastructure

URI

http://hdl.handle.net/10362/174691

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Licença CC

cclicense-by

Ver registo completo