Gaussian Vector Quantization: Extracting Effective Speech Representations from Audio Data

Marques, Alexandre Guilherme

http://hdl.handle.net/10362/190755

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TCDMAA4356.pdf		5.64 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Marques, Alexandre Guilherme

Orientador(es)

Castelli, Mauro

Resumo(s)

Di!erentiable vector quantization has become a prerequisite for the development of deep encoding models. Standard vector quantization methods are either only approximately di!erentiable or unable to fully capture the local distribution of its inputs. In this work we introduce the Gaussian vector quantization method, the first fully di!erentiable local vector quantization method. At its core, this method is composed by a Gaussian mixture modeling layer which is able to learn a Gaussian mixture distribution over its input data. The proposed implementation has a 𝐿(𝑀𝑁2) computational complexity for the forward and backward passes, as opposed to the 𝐿(𝑀𝑁3) time complexity associated with the naive implementation. We apply Gaussian vector quantization to audio encoding, and verify that this technique is able to generate more e!ective contextual representations of speech data compared to the standard vector quantization methods.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science

Palavras-chave

audio encoding speech encoding vector quantization Gaussian mixture modeling layer Gaussian vector quantization SDG 9 - Industry, innovation and infrastructure

URI

http://hdl.handle.net/10362/190755

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Licença CC

cclicense-by

Ver registo completo