Interpretability of deep neural networks at the model level

Martins, Joana Luís

http://hdl.handle.net/10362/135416

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Martins_2021.pdf		18.41 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Martins, Joana Luís

Orientador(es)

Krippahl, Ludwig

Resumo(s)

Deep neural networks (DNN) are very powerful tools but remain black boxes. Convolutional neural networks (CNN) are a type of DNN specialized for certain tasks like image classification, image segmentation and object detection, among others. These are the focus of this study. Because of the risks involved in applying these networks to areas such as medical imaging and autonomous driving, it is important to understand the behaviour of CNNs, in particular, how they reach their predictions. This will also be beneficial for their further development. Currently, most interpretability techniques are aimed at providing better understanding of a single network instance and often explain only the prediction of a given individual input example. However, for the same model (same architecture, task and dataset), one can have many different hypotheses, simply by changing the initial conditions (weights initialization). These sources of variability, between examples, for the same hypothesis (model instance), and between hypothesis for the same example, are not taken into account by the vast majority of current interpretability methods. This opens the question whether there is consistency between hypothesis in the attributes the interpretability diagnostics capture as being relevant for the predictions. In this work, tools and methods were developed to analyse these two forms of variability. They were applied to two interpretability diagnostics (saliency maps and sensitivity to occlusion) for several models, with different architectures and tasks. Furthermore, it was shown how they can be used to identify potential problems in the relevance of the attributes the interpretability diagnostics capture in some of the scenarios studied. The method also provides a means to assess possible strategies for mitigating this issue.

As redes neuronais profundas são ferramentas muito poderosas, mas permanecem caixas negras. As redes convolucionais são um tipo de redes neuronais profundas especialmente adaptado a tarefas como a classificação de imagens, a segmentação de imagens e a detecção de objectos, entre outras. Estas são o foco deste estudo. Devido aos riscos associados à aplicação destas redes a áreas como a imagiologia médica e a condução autónoma, é importante compreender o comportamento das redes convolucionais, em particular, como chegam às suas previsões. Esta compreensão também será benéfica para o seu crescente desenvolvimento. Actualmente, a maioria das técnicas de interpretabilidade estão vocacionadas para possibilitar uma maior compreensão de uma instância de rede em particular, e frequentemente oferecem uma explicação para um exemplo em particular tirado do conjunto de dados inicial. No entanto, para o mesmo modelo (mesma arquitectura, tarefa e conjunto de dados), podem ter-se muitas hipóteses diferentes, pela simples mudança das condições iniciais (inicialização dos pesos). Estas fontes de variabilidade, entre exemplos, para a mesma hipótese (instância do modelo), e entre hipóteses para o mesmo exemplo, não são tidas em conta na grande maioria das técnicas de interpretabilidade actuais. Isto abre uma questão sobre se há consistência entre hipóteses quanto aos atributos que os diagnósticos de interpretabilidade capturam como sendo relevantes para as previsões. Neste trabalho, desenvolveram-se ferramentas e métodos para analisar estas duas formas de variabilidade. Estas foram aplicadas a dois diagnósticos de interpretabilidade (mapas de saliência e sensibilidade à oclusão) para vários modelos, com diferentes arquitecturas e tarefas. Adicionalmente, mostrou-se como podem ser usadas para identificar potenciais problemas quanto à relevância dos atributos capturados pelos diagnósticos de interpretabilidade em alguns dos cenários estudados. Este método proporciona também uma forma de avaliar possíveis estratégias para mitigar o problema.

Palavras-chave

eep neural networks convolutional neural networks interpretability techniques

URI

http://hdl.handle.net/10362/135416

Coleções

FCT: DI - Dissertações de Mestrado

Ver registo completo