Adversarial Imagery: The Potential of Gamified Benchmarking in Evaluating Deep Generative Models of Images

Domingues, João Miguel Damas

http://hdl.handle.net/10362/200239

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Domingues_2025.pdf		23.95 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Domingues, João Miguel Damas

Orientador(es)

Nóbrega, Rui

Valente, Pedro

Resumo(s)

The evolution of technology and the continuous research in the field of AI have allowed for the rise of deep generative models capable of producing content that could be potentially mistaken as being made by a human being. Whether a simple conversation with a conversational agent or a database expanded with synthetic images for model training purposes, generative models show promising potential. However, benchmarking becomes a critical step for users in choosing a model to ensure the generative model matches their requirements for the task at hand. Despite the existence of many metrics to evaluate a model’s quality, people’s perception remains an important factor, even if it can become challenging to gather enough people for an evaluation to be considered reliable. As such, this thesis proposes a methodology to evaluate the quality of generative models of images by producing synthetic images and assess their realism when confronted with a set of real images that conditioned their generation. In the process, we introduce a game inspired by the Turing tests called Adversarial Imagery, which incentivizes players to distinguish real images from AI-generated images. As a result, we could demonstrate the potential of gamified benchmarking as a strategy to evaluate the realism of AI-generated images. Additionally, we also aimed to achieve a better understanding of current technology’s capabilities and limitations and how much it takes to fool a human being. This thesis presents the data collected by Adversarial Imagery during a summative study (𝑁 = 129), which resulted in the collection of 7,760 human-classified images and allowed for the ranking of three generative models considering the realism of the generated images. It also highlights the challenges of designing a system that aims to maintain a high level of engagement while ensuring the quality and trustworthiness of the collected data, considering the data obtained during a formative study (𝑁 = 22), which revealed the influence of the game’s design on the collected data.

A evolução da tecnologia e a investigação contínua no campo da IA permitiram o surgi- mento de modelos generativos profundos capazes de produzir conteúdos com o potencial de serem confundidos como feitos por um ser humano. Quer seja uma simples conversa com um agente conversacional ou uma base de dados expandida com imagens sintéticas para fins de treino de modelos, os modelos generativos mostram um potencial promissor. No entanto, o benchmarking torna-se numa etapa crítica para os utilizadores na escolha de um modelo para garantir que o modelo generativo corresponde aos seus requisitos para a tarefa em questão. Apesar da existência de muitas métricas para avaliar a qualidade de um modelo, a perceção das pessoas permanece um fator importante, ainda que possa ser desafiante reunir pessoas suficientes para que uma avaliação seja considerada fiável. Sendo assim, esta tese propõe uma metodologia para avaliar a qualidade de modelos generativos de imagens através da produção de imagens sintéticas e avaliar o seu realismo quando confrontadas com um conjunto de imagens reais que condicionaram a sua geração. No processo, apresentamos um jogo inspirado pelos testes de Turing chamado Adversarial Imagery, que incentiva os jogadores a distinguirem imagens reais de imagens geradas por IA. Como resultado, conseguimos demonstrar o potencial do benchmarking gamificado como estratégia para avaliar o realismo de imagens geradas por IA. Para além disso, também procurámos compreender melhor as capacidades e limitações da tecnologia atual e do quanto é necessário para enganar um ser humano. Esta tese apresenta os dados recolhidos pelo Adversarial Imagery durante um estudo sumativo (𝑁 = 129), que resultou na recolha de 7760 imagens classificadas por humanos e permitiu a classificação de três modelos generativos considerando o realismo das imagens geradas. Também destaca os desafios de conceber um sistema que procura manter um alto nível de interesse e, ao mesmo tempo, garantir a qualidade e a fiabilidade dos dados recolhidos, considerando os dados obtidos num estudo formativo (𝑁 = 22) que revelou a influência do design do jogo nos dados colecionados.

Palavras-chave

Artificial Intelligence Human-Computer Interaction Deep Generative Models AI-Generated Images Turing Test Gamified Benchmarking

URI

http://hdl.handle.net/10362/200239

Coleções

FCT: DI - Dissertações de Mestrado

Licença CC

cclicense-by

Ver registo completo