Pitch correlogram clustering for fast speaker identification (Q2570287): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
Created claim: DBLP publication ID (P1635): journals/ejasp/JhanwarR04, #quickstatements; #temporary_batch_1731475607626
 
(2 intermediate revisions by 2 users not shown)
Property / Wikidata QID
 
Property / Wikidata QID: Q56945774 / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / DBLP publication ID
 
Property / DBLP publication ID: journals/ejasp/JhanwarR04 / rank
 
Normal rank

Latest revision as of 07:05, 13 November 2024

scientific article
Language Label Description Also known as
English
Pitch correlogram clustering for fast speaker identification
scientific article

    Statements

    Pitch correlogram clustering for fast speaker identification (English)
    0 references
    0 references
    0 references
    28 October 2005
    0 references
    Summary: Gaussian mixture models (GMMs) are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coefficient, spectral slopes, and so forth. The effectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500\% as well as a 10\% reduction of error in overall speaker identification.
    0 references
    speaker identification
    0 references
    clustering
    0 references
    pitch
    0 references
    correlogram
    0 references

    Identifiers