The integration of phonetic in speech technology. (Q1768193)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: The integration of phonetic in speech technology. |
scientific article; zbMATH DE number 2145752
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | The integration of phonetic in speech technology. |
scientific article; zbMATH DE number 2145752 |
Statements
The integration of phonetic in speech technology. (English)
0 references
14 March 2005
0 references
The main purpose of this book is to present a set of ten highest quality papers focusing on the topic of integration of phonetic knowledge in speech technology. The challenging question would be (not whether, but) how much of the linguistics field of Phonetics / Phonology, as the sound structures of spoken language, can be incorporated in speech technology. The valuable contributions of this volume prove convincingly that phonetic knowledge already permeates this integration, though it is not very clear to what extent and with which relevance. The volume contains the contributions and (panel) discussions (on the considered topic), held at Eurospeech 2001 in Aalborg (Denmark). The following titles and authors are comprised: Chapter 1: Phonetic knowledge in speech technology -- and phonetic knowledge from speech technology? (William Barry, Wim A. van Dommelen, Jacques Koreman). This paper offers a survey analysis of the volume contributions and of the main ideas enclosed within the framework of the essential question: ``What sort of phonetic knowledge is relevant to speech technology?'' Chapter 2: Can phonetic knowledge be used to improve the performance of speech recognizers and synthesizers? (W. A. Ainsworth). The author (alone -- and so regrettably lost -- among the other contributors) stresses the need to develop new mathematical models that are able to capture the relationship between temporally overlapping of the underlying articulatory gestures and the resulting surface acoustic signal. Chapter 3: Prosodic models, automatic speech understanding, and speech synthesis: towards the common ground? (Anton Batliner, Bernd Möbius). This special paper addresses the question of phonetic knowledge integration in both speech recognition and synthesis. The authors argue for the use of prosodic knowledge rather than prosodic models within the process of automatic speech understanding, and support the need for clear and stable prosodic markers to be identified in order to define phrase boundaries and intonationally (thus informationally!) important elements. Chapter 4: Phonetic time maps (Julie Carson-Berndsen, Michael Walsh). This chapter presents a constraint-based (time map) model for the representation of phonetic constraints in a language; phonotactic automata are defined with respect to the syllable domain. Particularly interesting is that the constraint-based processing framework can operate both at the level of categorical representations and with probabilistic inputs that ranks the constrained phonetic / phonological information, realizing an improved robustness in speech recognition. Chapter 5: Introducing phonetically motivated, heterogeneous information into automatic speech recognition (Heidi Christensen, B. Lindberg, O. Andersen). This paper describes an automatic speech recognition system for which the central issue is the exploration of multi-source recognition (called heterogeneous processing). Two types of (voicing and broad class) expert configurations are utilized with the aim of increasing speech recognition performance and of improving the noise robustness. Chapter 6: Introducing contextual transcription rules in large vocabulary speech recognition (G. Gravier, F. Yvon, B. Jacob, F. Bimbot). The authors support the integration of contextual phonological rules in the beam-search algorithm of a large (French) vocabulary speech recognition system. An interesting discussion highlights the interactions between phonetic factors (production task and speaking style), phonetic modelling complexity, lexicon resource, and constraint definitions. Chapter 7: From here to utility -- melding phonetic insight with speech technology (Steven Greenberg). The paper reveals the fundamental importance of the two-way relationship between speech science and technology, i.e. of melding phonetic insight with speech technology to improve both the applications and the basic science. This approach is illustrated with the relation between prosodic and phonetic properties of conversational telephone dialogues (American English) using the Switchboard corpus. Phonetic properties are shown to reflect prosodic phenomena, being used to enhance the quality of automatic speech recognition and to better understand the nature of spoken language. Chapter 8: Pronunciation modeling (M. Pastor, F. Casacuberta). In this paper, word pronunciations are modelled using stochastic finite-state automata, on the basis of three criteria: number of pronunciations, cumulative percentage, and the threshold percentage (proved to be the most effective feature). This study confirms that the multiple use of the same word will result in a variety of forms, and that the more a word is used, the more likely it will be to deviate from the canonical form (!). The proposed models were applied in a translation-oriented speech task, with improved outcome depending on the language model assigned. Chapter 9: Phonetic knowledge in text-to-speech synthesis (Jan P. H. van Santen). This chapter focuses on the value of phonetics for speech technology in general and to text-to-speech synthesis in particular. The author considers that computational phonetics should provide that kind of phonetic knowledge (such as speech production / perception studies, architectural design, language-dependent details, mathematical models) that is able to illustrate the progress within specific domains such as text analysis (computing phonemes, prosodic tags), duration modelling, intonation modelling, signal processing (articulatory facts, segment lengthening details etc.). Chapter 10: Is phonetic knowledge of any use for speech technology (Helmer Strik). The author observes pertinently that while more phonetic knowledge should be incorporated in speech technology, the real amount of phonetic knowledge used in speech technology has decreased (!) over years. Several examples are analyzed, sustaining the main conclusion and trying to find out some of the causes for which the transfer of phonetic knowledge toward speech technology is problematic: (a) there are used different approaches in the fields of phonetics and speech technology, (b) phonetic knowledge is based on small amounts of `lab speech' and therefore does not generalize to `real speech', (c) the knowledge is not complete, and (d) the knowledge is not quantified in the right format.
0 references
phonetic knowledge
0 references
speech technology
0 references
automatic speech recognition and understanding
0 references
speech synthesis
0 references
prosodic models
0 references
text-to-speech synthesis systems
0 references
computational phonetics
0 references
sound structures of spoken language
0 references
phonetics
0 references
phonology
0 references
0 references