Statistical pronunciation modeling for non-native speech processing (Q2430551)

The purpose of this book (and research) is to provide a method that adjusts an automatic speech recognition (ASR) system such that it can recover some of the errors caused by non-native speaker pronunciation. The pronunciation dictionary constraints are relaxed for the task of non-native speech recognition. Then, by training on a non-native speech sample, the specific pronunciation error patterns of each accent are factored, without attempting to represent them explicitly. The main contribution of this work on statistical language modelling of non-native pronunciations are summarized as follows: (a) Concept design, implementation and evaluation of a novel approach to handle the pronunciations variations of non-native speech is proposed, using an implicit and statistical technique based on discrete hidden Markov models (HMM) as statistical dictionary. The proposed models are proven to be affective in increasing the recognition rates of non-native speakers, independent of the accent and without using specific expert knowledge. (b) A large database of non-native speech of around 100 speakers from five different accent groups is realized, being recorded under the most appropriate conditions and including human expert pronunciation ratings. (c) While general pronunciation networks have already been proposed, modeling non-native pronunciations with word-level HMMs and applying them for rescoring represent an original approach. (d) The evaluation of the proposed HMM technique is shown to be effective and can allow for any pronunciation variation, regardless of whether these variations have been previously observed in the training data or not. (e) The proposed method does not decrease the recognition performance due to additional confusions, which is an important problem for the common approaches of the phoneme confusion rule generation. (f) Being a fully data-driven approach, the proposed HMM method can be applied to any accent, i.e., a pair of native and non-native language speakers, without any need for expert knowledge about the accent. The book is organized into seven chapters and four appendices. The short Chapter 1 introduces the investigated problem and the proposed solution. Chapter 2, Automatic speech recognition, outlines the basics of the ASR, while non-native speech and its challenges are outlined in Chapter 3. Chapter 4, Pronunciation variation modeling in the literature, surveys the literature and state-of-the-art in the field of non-native speech recognition. Chapter 5 presents the collection and properties of a large non-native speech database as a significant contribution to the work of this book. Chapter 6, Handling non-native speech, contains rule-based phoneme lattice processing, multilingual weighted codebooks of semi-continuous HMM-based recognizers, automatic scoring algorithms of non-native speaker pronunciation with mispronounced word detection, and prosodic analysis of non-verbal utterances contributing to a deeper understanding of non-phonemic effects. Chapter 7 explains the generation of pronunciation HMMs, describes the training and evaluation of HMMs as a statistical lexicon for non-native speech recognition, and shows how HMMs are applied to increase the recognition rates for non-native speech. Critical discussion and future directions of this research work are presented. Four useful appendices are enclosed: (A) Hotel reservation dialog; (B) Confusion matrices; (C) Speaker information; (D) Human evaluation.

0 references

reviewed by

Neculai Curteanu

0 references

zbMATH Keywords

automatic speech recognition

0 references

non-native speech recognition methods

0 references

hidden Markov model (HMM)

0 references

HMM as statistical dictionary