Towards Explaining Actions of Learning Agents

Rodrigues, Bruno; Knorr, Matthias; Krippahl, Ludwig; Gonçalves, Ricardo

http://hdl.handle.net/10362/201451

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Rodrigues_et_al._2023._Towards_Explaining_Actions_of_Learning_Agents..pdf		982.98 KB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Resumo(s)

Agents increasingly use Deep Neural Networks to process sensor information and make decisions. While these models have been shown to provide excellent results, they come with the disadvantage that they behave like black boxes, mapping inputs to outputs in a way that is hard for humans to understand. This is a serious disadvantage because it makes it harder to predict how agents will act in unexpected situations, which is especially dangerous when agents have to interact physically with humans, such as self-driving vehicles or industrial robots, but also creates risks for agents such as chat bots and other virtual agents since their actions may result in legal liabilities or reputation damage. Being able to explain decisions taken by these neural networks that guide the agents is important for preventing incorrect behavior and for building trust and providing legal justifications whenever necessary. This applies not only to interactions with humans, but also to multiagent systems. In this paper, we build on a recent framework on Explainable AI that uses small neural networks to map activations from a trained deep neural network to relevant concepts in a logical formalization of the domain, which in turn can be used to provide explanations for the outputs of the original network. Since this framework is applied to the deep neural network at inference time, after training, it can be applied to neural networks used in agents regardless of whether these were trained using supervised or reinforcement learning. We show that a potential bottleneck of the approach, the creation of such mapping networks, can be solved by employing automated neural architecture search. This paves the way towards applying this approach to more advanced use cases of explaining decisions of agents based on deep neural networks, regardless of how these networks were trained.

Palavras-chave

Explanations Neural Architecture Reinforcement Learning

URI

http://hdl.handle.net/10362/201451

Coleções

Home collection (FCT)

Ver registo completo