Converting numerical classification into text classification (Q1853681)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Converting numerical classification into text classification
scientific article

    Statements

    Converting numerical classification into text classification (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    22 January 2003
    0 references
    Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic---in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features.
    0 references
    machine learning
    0 references
    text classification
    0 references
    information retrieval
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers