Subband architecture for automatic speaker recognition. (Q1575498)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Subband architecture for automatic speaker recognition.
scientific article

    Statements

    Subband architecture for automatic speaker recognition. (English)
    0 references
    0 references
    21 August 2000
    0 references
    We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are independently applied and then recombined to yield a global score and a global recognition decision. The choice of the subband architecture and the recombination strategies are particularly discussed. This technique had been shown to be robust for speech recognition when a narrow band noise degradation occurs. We first objectively verify this robustness for the speaker identification task. We also study which information is really used to recognize speakers. For this, speaker identification experiments on independent subbands are conducted for 630 speakers of TIMIT and NTIMIT databases. The results show that the speaker specific information is not equally distributed among subbands. In particular, the low-frequency subbands (under 600 Hz) and the high-frequency subbands (over 3000 Hz) are more speaker-specific than middle-frequency ones. In addition, experiments on different subband system architectures show that the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into subbands. Consequently, we propose a particularly redundant parallel architecture for which most of the correlations are kept. The performances obtained with this new system, using linear recombination strategies, are equivalent to those of a conventional fullband recognizer on clean and telephone speech. Experiments on speech corrupted by unpredictable noise show a better adaptability of this approach in noisy environments, compared to a conventional device, especially when pruning of some recognizers is performed.
    0 references
    Speaker identification
    0 references
    Subband architecture
    0 references
    Recombination
    0 references
    Parallel model
    0 references
    Correlation
    0 references
    Noisy speech
    0 references

    Identifiers