A Genetic Algorithm Framework for Jailbreaking Large Language Models [poster]

Bonin, Lorenzo; Cusin, Lorenzo; De Lorenzo, Andrea; Castelli, Mauro; Manzoni, Luca

doi:https://doi.org/10.1145/3712255.3726687

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10362/187321

Registo completo

Campo DC	Valor	Idioma
dc.contributor.author	Bonin, Lorenzo	-
dc.contributor.author	Cusin, Lorenzo	-
dc.contributor.author	De Lorenzo, Andrea	-
dc.contributor.author	Castelli, Mauro	-
dc.contributor.author	Manzoni, Luca	-
dc.date.accessioned	2025-09-01T21:11:54Z	-
dc.date.available	2025-09-01T21:11:54Z	-
dc.date.issued	2025-08-11	-
dc.identifier.isbn	979-8-4007-1464-1	-
dc.identifier.other	PURE: 128435852	-
dc.identifier.other	PURE UUID: 556556f1-ebdf-4a7b-8482-ff59210dd80d	-
dc.identifier.other	Scopus: 105014587226	-
dc.identifier.other	ORCID: /0000-0002-8793-1451/work/190962198	-
dc.identifier.uri	http://hdl.handle.net/10362/187321	-
dc.description	Bonin, L., Cusin, L., De Lorenzo, A., Castelli, M., & Manzoni, L. (2025). A Genetic Algorithm Framework for Jailbreaking Large Language Models [poster]. In G. Ochoa (Ed.), GECCO '25 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 779-782). ACM - Association for Computing Machinery. https://doi.org/10.1145/3712255.3726687 --- This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 (DOI: 10.54499/UIDB/04152/2020) - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS), and the project 2024.07277.IACDC (Lexa).	-
dc.description.abstract	Despite their capabilities to generate human-like text and aid in various tasks, Large Language Models (LLMs) are susceptible to misuse. To mitigate this risk, many LLMs undergo safety alignment or refusal training to allow them to refuse unsafe or unethical requests. Despite these measures, LLMs remain exposed to jailbreak attacks—i.e., adversarial techniques that manipulate the models to generate unsafe outputs. Jailbreaking typically involves crafting specific prompts or adversarial inputs that bypass the models' safety mechanisms. This paper examines the robustness of safety-aligned LLMs against adaptive jailbreak attacks, focusing on a genetic algorithm-based approach.	en
dc.format.extent	4	-
dc.language.iso	eng	-
dc.publisher	ACM - Association for Computing Machinery	-
dc.relation	https://doi.org/10.54499/UIDB/04152/2020	-
dc.relation	https://doi.org/10.54499/2024.07277.IACDC	-
dc.rights	openAccess	-
dc.subject	Genetic Algorithm	-
dc.subject	Large Language Model	-
dc.subject	Jailbreak	-
dc.subject	Adversarial Attack	-
dc.subject	Adaptive Attack	-
dc.subject	Artificial Intelligence	-
dc.subject	Software	-
dc.subject	Control and Optimization	-
dc.subject	Discrete Mathematics and Combinatorics	-
dc.subject	Logic	-
dc.subject	SDG 9 - Industry, Innovation, and Infrastructure	-
dc.title	A Genetic Algorithm Framework for Jailbreaking Large Language Models [poster]	-
dc.type	conferenceObject	-
degois.publication.firstPage	779	-
degois.publication.lastPage	782	-
degois.publication.title	GECCO '25 Companion	-
degois.publication.title	Genetic and Evolutionary Computation Conference	-
dc.peerreviewed	yes	-
dc.identifier.doi	https://doi.org/10.1145/3712255.3726687	-
dc.description.version	publishersversion	-
dc.description.version	published	-
dc.contributor.institution	Information Management Research Center (MagIC) - NOVA Information Management School	-
dc.contributor.institution	NOVA Information Management School (NOVA IMS)	-
Aparece nas colecções:	NIMS: MagIC - Documentos de conferências internacionais

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
Genetic_Algorithm_Framework_for_Jailbreaking_LLM.pdf		1,77 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato simples Dê a sua opinião sobre este registo.