Using group delay functions from all-pole models for speaker recognition

Rajan, Padmanabhan; Kinnunen, Tomi H.; Pohjalainen, Jouni; Alku, Paavo; Bimbot, F.; Cerisara, C.; Fougeron, C.; Gravier, G.; Lamel, L.; Pellegrino, F.; Perrier, P.

Bu öğeden alıntı yapmak, öğeye bağlanmak için bu tanımlayıcıyı kullanınız: http://hdl.handle.net/11452/30193

Başlık:	Using group delay functions from all-pole models for speaker recognition
Yazarlar:	Rajan, Padmanabhan Kinnunen, Tomi H. Pohjalainen, Jouni Alku, Paavo Bimbot, F. Cerisara, C. Fougeron, C. Gravier, G. Lamel, L. Pellegrino, F. Perrier, P. Uludağ Üniversitesi/Mühendislik Fakültesi/Elektrik Elektronik Mühendisliği Bölümü. Hanilçi, Cemal S-4967-2016 35781455400
Anahtar kelimeler:	Computer science Engineering Speaker verification Group delay functions High vocal effort Additive noise Verification Discrete Fourier transforms Group delay Poles Signal processing Speech processing Direct computations Group delay functions Mel-frequency cepstral coefficients Recognition accuracy Speaker recognition Speaker recognition evaluations Speaker verification Vocal efforts Speech recognition
Yayın Tarihi:	2013
Yayıncı:	Isc-Int Speech Communication Association
Atıf:	Rajan, P. vd. (2013). "Using group delay functions from all-pole models for speaker recognition". 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), 1-5, 2488-2492.
Özet:	Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals.
Açıklama:	Bu çalışma, 25-29 Ağustos 2013 tarihlerinde Lyon[Fransa]'da düzenlenen 14. Annual Conference of the International Speech Communication Association [Interspeech 2013]'da bildiri olarak sunulmuştur.
URI:	http://faculty.iitmandi.ac.in/~padman/papers/padman_gdAllPole_interspeech2013.pdf http://hdl.handle.net/11452/30193
ISSN:	2308-457X
Koleksiyonlarda Görünür:	Scopus Web of Science

Bu öğenin dosyaları:

Dosya	Açıklama	Boyut	Biçim
Hanilci_vd_2013.pdf		123.35 kB	Adobe PDF	Göster/Aç

Tüm Öğe Kaydını Göster İstatistikler

Bu öğe kapsamında lisanslı Creative Commons License

Bursa Uludağ Üniversitesi Açık Erişim Sistemi

Bursa Uludağ Üniversitesinin araştırma çıktılarının yer aldığı açık erişim sistemidir.