Enhancing Automatic Speech Recognition

Perezhohin, Yuriy; Santos, Tiago; Costa, Victor; Peres, Fernando; Castelli, Mauro

doi:https://doi.org/10.1109/ACCESS.2024.3482970

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10362/173966

Título:	Enhancing Automatic Speech Recognition
Autor:	Perezhohin, Yuriy Santos, Tiago Costa, Victor Peres, Fernando Castelli, Mauro
Palavras-chave:	Automatic Speech Recognition Contrastive Learning Data Augmentation Embeddings Synthetic Data Filtering Text-to-Speech Computer Science(all) Materials Science(all) Engineering(all) SDG 8 - Decent Work and Economic Growth SDG 9 - Industry, Innovation, and Infrastructure
Data:	31-Dez-2024
Resumo:	This paper presents a novel methodology for enhancing Automatic Speech Recognition (ASR) performance by utilizing contrastive learning to filter synthetic audio data. We address the challenge of incorporating synthetic data into ASR training, especially in scenarios with limited real-world data or unique linguistic characteristics. The method utilizes a contrastive learning model to align representations of synthetic audio and its corresponding text transcripts, enabling the identification and removal of low-quality samples that do not align well semantically. We evaluate the methodology on a medium-resource language across two distinct datasets: a general-domain dataset and a regionally specific dataset characterized by unique pronunciation patterns. Experimental results reveal that the optimal filtering strategy depends on both model capacity and dataset characteristics. Larger models, like Whisper Large V3, particularly benefit from aggressive filtering, while smaller models may not require such stringent
Descrição:	Perezhohin, Y., Santos, T., Costa, V., Peres, F., & Castelli, M. (2024). Enhancing Automatic Speech Recognition: Effects of Semantic Audio Filtering on Models Performance. IEEE Access, 12, 155136 - 155150. https://doi.org/10.1109/ACCESS.2024.3482970 --- This work was supported by MyNorth AI Research. This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 (DOI: 10.54499/UIDB/04152/2020) - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS).
Peer review:	yes
URI:	http://hdl.handle.net/10362/173966
DOI:	https://doi.org/10.1109/ACCESS.2024.3482970
ISSN:	2169-3536
Aparece nas colecções:	NIMS: MagIC - Artigos em revista internacional com arbitragem científica (Peer-Review articles in international journals)

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
Enhancing_Automatic_Speech_Recognition_Effects_of_Semantic_Audio_Filtering_on_Models_Performance.pdf		1,73 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo Dê a sua opinião sobre este registo.