A Genetic Algorithm Framework for Jailbreaking Large Language Models [poster]

Bonin, Lorenzo; Cusin, Lorenzo; De Lorenzo, Andrea; Castelli, Mauro; Manzoni, Luca

http://hdl.handle.net/10362/187321

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Genetic_Algorithm_Framework_for_Jailbreaking_LLM.pdf		1.73 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Resumo(s)

Despite their capabilities to generate human-like text and aid in various tasks, Large Language Models (LLMs) are susceptible to misuse. To mitigate this risk, many LLMs undergo safety alignment or refusal training to allow them to refuse unsafe or unethical requests. Despite these measures, LLMs remain exposed to jailbreak attacks—i.e., adversarial techniques that manipulate the models to generate unsafe outputs. Jailbreaking typically involves crafting specific prompts or adversarial inputs that bypass the models' safety mechanisms. This paper examines the robustness of safety-aligned LLMs against adaptive jailbreak attacks, focusing on a genetic algorithm-based approach.

Descrição

Bonin, L., Cusin, L., De Lorenzo, A., Castelli, M., & Manzoni, L. (2025). A Genetic Algorithm Framework for Jailbreaking Large Language Models [poster]. In G. Ochoa (Ed.), GECCO '25 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 779-782). ACM - Association for Computing Machinery. https://doi.org/10.1145/3712255.3726687 --- This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 (DOI: 10.54499/UIDB/04152/2020) - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS), and the project 2024.07277.IACDC (Lexa).

Palavras-chave

Genetic Algorithm Large Language Model Jailbreak Adversarial Attack Adaptive Attack Artificial Intelligence Software Control and Optimization Discrete Mathematics and Combinatorics Logic SDG 9 - Industry, Innovation, and Infrastructure

URI

http://hdl.handle.net/10362/187321

Editora

ACM - Association for Computing Machinery

DOI

10.1145/3712255.3726687

Coleções

NIMS: MagIC - Documentos de conferências internacionais

Métricas Alternativas

Ver registo completo