Speaker identification from shouted speech: Analysis and compensation

Kinnunen, Tomi; Saeidi, Rahim; Pohjalainen, Jouni; Alku, Paavo

Please use this identifier to cite or link to this item: http://hdl.handle.net/11452/32501

Title:	Speaker identification from shouted speech: Analysis and compensation
Authors:	Kinnunen, Tomi Saeidi, Rahim Pohjalainen, Jouni Alku, Paavo Uludağ Üniversitesi/Mühendislik Fakültesi/Elektrik-Elektronik Mühendisliği Bölümü. Hanilçi, Cemal Ertaş, Figen AAH-4188-2021 S-4967-2016 35781455400 24724154500
Keywords:	Acoustics Engineering Speaker identification Shouted speech Loudspeakers Mapping Signal processing Speech Emotional speech Gaussian mixture model Identification accuracy Mapping techniques Mel-frequency cepstral coefficients Recognition accuracy Speaker identification Text-independent speaker identification Speech recognition
Issue Date:	2013
Publisher:	IEEE
Citation:	Hanilçi, C. vd. (2013). “Speaker identification from shouted speech: Analysis and compensation”. International Conference on Acoustics Speech and Signal Processing ICASSP, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8027-8031.
Abstract:	Text-independent speaker identification is studied using neutral and shouted speech in Finnish to analyze the effect of vocal mode mismatch between training and test utterances. Standard mel-frequency cepstral coefficient (MFCC) features with Gaussian mixture model (GMM) recognizer are used for speaker identification. The results indicate that speaker identification accuracy reduces from perfect (100 %) to 8.71 % under vocal mode mismatch. Because of this dramatic degradation in recognition accuracy, we propose to use a joint density GMM mapping technique for compensating the MFCC features. This mapping is trained on a disjoint emotional speech corpus to create a completely speaker- and speech mode independent emotion-neutralizing mapping. As a result of the compensation, the 8.71 % identification accuracy increases to 32.00 % without degrading the non-mismatched train-test conditions much.
Description:	Bu çalışma, 26-31 Mayıs 2013 tarihleri arasında Vancouver[Kanada]’da düzenlenen IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)’da bildiri olarak sunulmuştur.
URI:	https://doi.org/10.1109/ICASSP.2013.6639228 http://hdl.handle.net/11452/32501
ISSN:	1520-6149
Appears in Collections:	Scopus Web of Science

Files in This Item:

File	Description	Size	Format
Hanilçi_vd_2013.pdf		563.22 kB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets