Using group delay functions from all-pole models for speaker recognition

Rajan, Padmanabhan; Kinnunen, Tomi H.; Pohjalainen, Jouni; Alku, Paavo; Bimbot, F.; Cerisara, C.; Fougeron, C.; Gravier, G.; Lamel, L.; Pellegrino, F.; Perrier, P.

Please use this identifier to cite or link to this item: http://hdl.handle.net/11452/30193

Full metadata record

DC Field	Value	Language
dc.contributor.author	Rajan, Padmanabhan	-
dc.contributor.author	Kinnunen, Tomi H.	-
dc.contributor.author	Pohjalainen, Jouni	-
dc.contributor.author	Alku, Paavo	-
dc.contributor.author	Bimbot, F.	-
dc.contributor.author	Cerisara, C.	-
dc.contributor.author	Fougeron, C.	-
dc.contributor.author	Gravier, G.	-
dc.contributor.author	Lamel, L.	-
dc.contributor.author	Pellegrino, F.	-
dc.contributor.author	Perrier, P.	-
dc.date.accessioned	2022-12-30T11:58:03Z	-
dc.date.available	2022-12-30T11:58:03Z	-
dc.date.issued	2013	-
dc.identifier.citation	Rajan, P. vd. (2013). "Using group delay functions from all-pole models for speaker recognition". 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), 1-5, 2488-2492.	en_US
dc.identifier.issn	2308-457X	-
dc.identifier.uri	http://faculty.iitmandi.ac.in/~padman/papers/padman_gdAllPole_interspeech2013.pdf	-
dc.identifier.uri	http://hdl.handle.net/11452/30193	-
dc.description	Bu çalışma, 25-29 Ağustos 2013 tarihlerinde Lyon[Fransa]'da düzenlenen 14. Annual Conference of the International Speech Communication Association [Interspeech 2013]'da bildiri olarak sunulmuştur.	tr_TR
dc.description.abstract	Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals.	en_US
dc.description.sponsorship	Academy of Finland (253120)	en_US
dc.description.sponsorship	Int Speech Commun Association	en_US
dc.description.sponsorship	Amazon	en_US
dc.description.sponsorship	Microsoft	en_US
dc.description.sponsorship	Google	en_US
dc.description.sponsorship	TcL SYTRAL	en_US
dc.description.sponsorship	European Language Resources Association	en_US
dc.description.sponsorship	Ouaero	en_US
dc.description.sponsorship	Imaginove	en_US
dc.description.sponsorship	VOCAPIA Research	en_US
dc.description.sponsorship	Acapela	en_US
dc.description.sponsorship	Speech Ocean	en_US
dc.description.sponsorship	ALDEBARAN	en_US
dc.description.sponsorship	Orange	en_US
dc.description.sponsorship	Vecsys	en_US
dc.description.sponsorship	IBM Research	en_US
dc.description.sponsorship	Raytheon BBN Technology	en_US
dc.description.sponsorship	Voxygen	en_US
dc.language.iso	en	en_US
dc.publisher	Isc-Int Speech Communication Association	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.rights	Atıf Gayri Ticari Türetilemez 4.0 Uluslararası	tr_TR
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Computer science	en_US
dc.subject	Engineering	en_US
dc.subject	Speaker verification	en_US
dc.subject	Group delay functions	en_US
dc.subject	High vocal effort	en_US
dc.subject	Additive noise	en_US
dc.subject	Verification	en_US
dc.subject	Discrete Fourier transforms	en_US
dc.subject	Group delay	en_US
dc.subject	Poles	en_US
dc.subject	Signal processing	en_US
dc.subject	Speech processing	en_US
dc.subject	Direct computations	en_US
dc.subject	Group delay functions	en_US
dc.subject	Mel-frequency cepstral coefficients	en_US
dc.subject	Recognition accuracy	en_US
dc.subject	Speaker recognition	en_US
dc.subject	Speaker recognition evaluations	en_US
dc.subject	Speaker verification	en_US
dc.subject	Vocal efforts	en_US
dc.subject	Speech recognition	en_US
dc.title	Using group delay functions from all-pole models for speaker recognition	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.wos	000395050001036	tr_TR
dc.identifier.scopus	2-s2.0-84906257507	tr_TR
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası	tr_TR
dc.contributor.department	Uludağ Üniversitesi/Mühendislik Fakültesi/Elektrik Elektronik Mühendisliği Bölümü.	tr_TR
dc.identifier.startpage	2488	tr_TR
dc.identifier.endpage	2492	tr_TR
dc.identifier.volume	1-5	tr_TR
dc.relation.journal	14th Annual Conference of the International Speech Communication Association (Interspeech 2013)	en_US
dc.contributor.buuauthor	Hanilçi, Cemal	-
dc.contributor.researcherid	S-4967-2016	tr_TR
dc.subject.wos	Computer science, artificial intelligence	en_US
dc.subject.wos	Engineering, electrical & electronic	en_US
dc.indexed.wos	CPCIS	en_US
dc.indexed.scopus	Scopus	en_US
dc.contributor.scopusid	35781455400	tr_TR
dc.subject.scopus	Speaker Verification; Speech Enhancement; Attack	en_US
Appears in Collections:	Scopus Web of Science

Files in This Item:

File	Description	Size	Format
Hanilci_vd_2013.pdf		123.35 kB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets