ArtiFlow: An Integrated Pipeline for Identifying and Analyzing Speech Disfluencies: A Deep Learning Framework for Enhanced Transcription, Classification and Interpretation of Disfluent Speech

Pérez, Ariel Enrique Cerda

http://hdl.handle.net/10362/190288

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TCDMAA4683.pdf		2.54 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Pérez, Ariel Enrique Cerda

Orientador(es)

Bação, Fernando José Ferreira Lucas

Resumo(s)

Automatic Speech Recognition (ASR) systems struggle with speech disfluencies like repetitions and filled pauses, which are common in spontaneous speech and disorders such as stuttering. This performance gap limits the accessibility and clinical utility of ASR technology. This research proposes a comprehensive Deep Learning framework to improve the transcription, classification, and interpretation of disfluent speech. The multi-stage methodology begins by fine-tuning state-of-the-art ASR models on the FluencyBank corpus to accurately transcribe disfluent events. Next, a custom-trained ModernBERT model performs token-level classification to identify and label specific disfluencies in the transcript. Finally, a specialized Large Language Model (LLM) provides a structured, context-aware analysis of the identified patterns. The integrated pipeline is demonstrated through a functional prototype, ArtiFlow (Articulate Flow), which showcases the system's end-to-end capabilities. Empirical results validate the framework's effectiveness. After a comparative analysis of different ASR model families, the Whisper Large v3 Turbo model was selected, offering an optimal balance between high transcription accuracy (12.94% Word Error Rate) and enhanced inference speed, while leveraging a stable and accessible implementation framework. The subsequent disfluency classifier performs with high reliability (weighted F1-score of 0.9512) and the selected LLM demonstrates strong capabilities in generating structured analyses. This thesis establishes a robust, modular system for both identifying and interpreting speech disfluencies, providing a foundation for advanced support tools. The work contributes to speech processing and assistive technology, offering significant potential for applications in clinical diagnosis, speech therapy, and the development of more inclusive human-computer interaction systems.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science

Palavras-chave

Automatic Speech Recognition Speech Disfluencies Large Language Models Spoken Language Processing Stuttering SDG 3 - Good health and well-being

URI

http://hdl.handle.net/10362/190288

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Licença CC

cclicense-by

Ver registo completo