Please use this identifier to cite or link to this item: http://hdl.handle.net/11452/26025
Title: Comparing spectrum estimators in speaker verification under additive noise degradation
Authors: Kinnunen, Tomi H.
Saeidi, Rahim
Pohjalainen, Jouni
Alku, Paavo
Sandberg, Johan
Hansson-Sandsten, Maria
Uludağ Üniversitesi/Mühendislik Fakültesi/Elektronik Mühendisliği Bölümü.
Hanilci, Cemal
Ertaş, Figen
AAH-4188-2021
S-4967-2016
35781455400
24724154500
Keywords: Acoustics
Engineering
Spectrum estimation
Speaker verification
Weighted linear prediction
Speech
Recognition
Acoustic noise
Additive noise
Discrete
Signal processing
Spectrum analysis
Babble noise
Dft method
Equal error rate
Linear prediction
Mel-frequency cepstral coefficients
Minimum variance distortionless response
Noise contamination
Noise degradations
Recognition performance
Speaker recognition
Fourier transforms
Spectrum estimators
Speech frames
Speech recognition
Issue Date: 2012
Publisher: IEEE
Citation: Hanilci, C. vd. (2012). "Comparing spectrum estimators in speaker verification under additive noise degradation". International Conference on Acoustics Speech and Signal Processing ICASSP, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4769-4772.
Abstract: Different short-term spectrum estimators for speaker verification under additive noise are considered. Conventionally, mel-frequency cepstral coefficients (MFCCs) are computed from discrete Fourier transform (DFT) spectra of windowed speech frames. Recently, linear prediction (LP) and its temporally weighted variants have been substituted as the spectrum analysis method in speech and speaker recognition. In this paper, 12 different short-term spectrum estimation methods are compared for speaker verification under additive noise contamination. Experimental results conducted on NIST 2002 SRE show that the spectrum estimation method has a large effect on recognition performance and stabilized weighted LP (SWLP) and minimum variance distortionless response (MVDR) methods yield approximately 7 % and 8 % relative improvements over the standard DFT method at -10 dB SNR level of factory and babble noises, respectively in terms of equal error rate (EER).
Description: Bu çalışma, 25-30 Mart 2012 tarihleri arasında Kyoto[Japonya]’da düzenlenen IEEE International Conference on Acoustics, Speech and Signal Processing’da bildiri olarak sunulmuştur.
URI: https://doi.org/10.1109/ICASSP.2012.6288985
https://ieeexplore.ieee.org/document/6288985
http://hdl.handle.net/11452/26025
ISBN: 978-1-4673-0046-9
ISSN: 1520-6149
Appears in Collections:Scopus
Web of Science

Files in This Item:
File Description SizeFormat 
Hanilci_vd_2012.pdf306.62 kBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons