Please use this identifier to cite or link to this item: http://hdl.handle.net/11452/30193
Title: Using group delay functions from all-pole models for speaker recognition
Authors: Rajan, Padmanabhan
Kinnunen, Tomi H.
Pohjalainen, Jouni
Alku, Paavo
Bimbot, F.
Cerisara, C.
Fougeron, C.
Gravier, G.
Lamel, L.
Pellegrino, F.
Perrier, P.
Uludağ Üniversitesi/Mühendislik Fakültesi/Elektrik Elektronik Mühendisliği Bölümü.
Hanilçi, Cemal
S-4967-2016
35781455400
Keywords: Computer science
Engineering
Speaker verification
Group delay functions
High vocal effort
Additive noise
Verification
Discrete Fourier transforms
Group delay
Poles
Signal processing
Speech processing
Direct computations
Group delay functions
Mel-frequency cepstral coefficients
Recognition accuracy
Speaker recognition
Speaker recognition evaluations
Speaker verification
Vocal efforts
Speech recognition
Issue Date: 2013
Publisher: Isc-Int Speech Communication Association
Citation: Rajan, P. vd. (2013). "Using group delay functions from all-pole models for speaker recognition". 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), 1-5, 2488-2492.
Abstract: Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals.
Description: Bu çalışma, 25-29 Ağustos 2013 tarihlerinde Lyon[Fransa]'da düzenlenen 14. Annual Conference of the International Speech Communication Association [Interspeech 2013]'da bildiri olarak sunulmuştur.
URI: http://faculty.iitmandi.ac.in/~padman/papers/padman_gdAllPole_interspeech2013.pdf
http://hdl.handle.net/11452/30193
ISSN: 2308-457X
Appears in Collections:Scopus
Web of Science

Files in This Item:
File Description SizeFormat 
Hanilci_vd_2013.pdf123.35 kBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons