Subband architecture for automatic speaker recognition. (Q1575498)

From MaRDI portal





scientific article; zbMATH DE number 1493335
Language Label Description Also known as
default for all languages
No label defined
    English
    Subband architecture for automatic speaker recognition.
    scientific article; zbMATH DE number 1493335

      Statements

      Subband architecture for automatic speaker recognition. (English)
      0 references
      0 references
      21 August 2000
      0 references
      We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are independently applied and then recombined to yield a global score and a global recognition decision. The choice of the subband architecture and the recombination strategies are particularly discussed. This technique had been shown to be robust for speech recognition when a narrow band noise degradation occurs. We first objectively verify this robustness for the speaker identification task. We also study which information is really used to recognize speakers. For this, speaker identification experiments on independent subbands are conducted for 630 speakers of TIMIT and NTIMIT databases. The results show that the speaker specific information is not equally distributed among subbands. In particular, the low-frequency subbands (under 600 Hz) and the high-frequency subbands (over 3000 Hz) are more speaker-specific than middle-frequency ones. In addition, experiments on different subband system architectures show that the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into subbands. Consequently, we propose a particularly redundant parallel architecture for which most of the correlations are kept. The performances obtained with this new system, using linear recombination strategies, are equivalent to those of a conventional fullband recognizer on clean and telephone speech. Experiments on speech corrupted by unpredictable noise show a better adaptability of this approach in noisy environments, compared to a conventional device, especially when pruning of some recognizers is performed.
      0 references
      Speaker identification
      0 references
      Subband architecture
      0 references
      Recombination
      0 references
      Parallel model
      0 references
      Correlation
      0 references
      Noisy speech
      0 references

      Identifiers