Scalable DRL based routing with topology changes

Crispim, Catarina Amaro Guilherme dos Santos

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10362/177248

Título:	Scalable DRL based routing with topology changes
Autor:	Crispim, Catarina Amaro Guilherme dos Santos
Orientador:	Amaral, Pedro
Palavras-chave:	Multi agent Deep Reinforcement Learning Network routing
Data de Defesa:	Mai-2024
Resumo:	Optimizing network resources is a challenge, especially nowadays, where vast amounts of data are handled and there is a demand for a dynamic network capable of servicing different types of devices in the same infrastructure. Routing in particular is very hard to optimize, the routing problem is hard to model and its optimal resolution is of intractable complexity. Furthermore, the network dynamism makes the use of the model based solution impractical. Deep Reinforcement Learning (DRL) is a suitable Artificial Intelligence (AI) technique for solving control problems without the need of a system model, and it has been proposed in the literature as a possible solution to solve the routing problem in networks. However, using DRL for routing has its own challenges. DRL agents have to be trained before being deployed and might not perform well when changes, such as link failures, occur in the network. There is also the dimensionality problem associated with the state and action spaces, which can be an issue for centralized approaches in many networks. In this work, a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) DRL architecture is proposed to solve the routing problem. Both central critic and distributed critic options are considered using simple Q network and dueling Q network critic imple- mentations. All combinations were trained in three different network topologies. To test the algorithms and evaluate their adaptability to small changes in the network, the original topologies used during the agent training process were changed, either with link failures or the addition on new ones, and a technique of continuous learning was employed. It was observed that the dueling Q network is the better critic algorithm and that centralized critics perform better than local critics, for the majority of the scenarios. Regarding the adaptability to network changes, the algorithms adapted to varying degrees. Moreover, the usage of continuous learning further improved those results. Simulations in three different types of topologies show that the network topology is an important factor to consider when selecting the best algorithm combination for reducing performance loss in the presence of network changes. No cenário tecnológico atual, otimizar recursos em redes é um desafio, uma vez que há uma elevada quantidade de dados a ser tratada e há uma necessidade de ter uma rede de encaminhamento dinâmica capaz de servir diferentes tipos de dispositivos na mesma infraestrutura. O encaminhamento é difícil de otimizar, o problema é difícil de modelar e a sua resolução ótima tem uma grande complexidade. Além disso, o dinamismo da rede torna o uso de uma solução baseada num modelo impraticável. Deep Reinforcement Learning DRL é uma técnica de Artificial Intelligence AI adequada para resolver problemas de controlo sem necessidade de ter um modelo do sistema, e tem sido proposta na literatura como uma solução para resolver o problema de routing em redes. No entanto, usar DRL para routing tem os seus desafios. Os agentes de DRL têm de ser treinados antes de serem utilizados e podem não ter o comportamento esperado quando ocorrem alterações na rede, tal como falhas em caminhos. Há também o problema da dimensionalidade que está associado com os espaços de estados e ações, o que pode ser um problema para soluções centralizadas. Neste trabalho, é proposta uma arquitetura de DRL com MADDPG para resolver o problema de encaminhamento. Foram utilizadas abordagens de critic central e local, e arquiteturas de simple Q network e dueling Q network. Todas as combinações foram treinadas em três tipos diferentes topologias de rede. Para testar os algoritmos e avaliar a sua adaptação a pequenas alterações na rede, as topologias originais de treino foram modificadas, com falhas em caminhos ou a adição de caminhos novos. Foi também utilizada uma técnica de aprendizagem contínua. As implementações com dueling Q network obtiveram melhor desempenho, e os critic centrais são melhores que os locais na maioria dos cenários. Relativamente às mudanças na rede, os algoritmos adaptaram-se de forma diferente. A utilização de aprendizagem contínua melhorou os resultados. As simulações feitas nos três tipos diferentes de topolo- gias mostram que a topologia de rede é um fator importante a considerar ao escolher o algoritmo para reduzir a perda de desempenho quando ocorrem alterações na rede.
URI:	http://hdl.handle.net/10362/177248
Designação:	MASTER IN ELECTRICAL AND COMPUTER ENGINEERING
Aparece nas colecções:	FCT: DEE - Dissertações de Mestrado

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
Crispim_2024.pdf		1,1 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo Dê a sua opinião sobre este registo.