| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.36 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
This thesis explores the privacy, security, and ethical risks of large language models (LLMs)
and proposes practical defenses to reduce these risks when using LLMs in real-world settings.
After reviewing existing research, three representative vulnerabilities - cognitive prompt
injection via persona roleplay, data extraction through repeated token sequences, and
fine‑tuning backdoors activated by rare triggers - were examined through experiment on a
state‑of‑the‑art LLM. For each scenario, specific defenses were designed and tested: an extra
output-scanning step to catch policy violations, an inference‑time filter to block excessive
token repetitions, and a combined rare‑token filter with a prompt‑classification step. The
results showed that these defenses significantly lowered the success rate of attacks, although
none offered complete protection, although no single approach achieves perfect scores.
Expert consultations with security and quality‑assurance professionals supported these
findings and pointed out on-going challenges in stress‑testing, red‑teaming, and maintaining
usability while ensuring strong safeguards. Based on both experimental results and expert
feedback, the thesis provides practical recommendations for safely integrating LLMs. These
include prompt engineering guidelines, layered output checks, input sanitization, access
controls, privacy-focused training methods, ongoing red-teaming, and governance measures
that comply with GDPR, CCPA, and other upcoming AI regulations.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
Palavras-chave
Large Language Models LLM Security Prompt Injection Data Extraction Backdoor Attacks Defense Mechanisms Output Scanning Input Validation Differential Privacy Ethical AI AI Governance Design Science Research Empirical Evaluation Red-Teaming Regulatory Compliance SDG 9 - Industry, innovation and infrastructure SDG 16 - Peace, justice and strong institutions SDG 17 - Partnerships for the goals
