Converting numerical classification into text classification (Q1853681): Difference between revisions
From MaRDI portal
ReferenceBot (talk | contribs) Changed an Item |
Set OpenAlex properties. |
||
Property / full work available at URL | |||
Property / full work available at URL: https://doi.org/10.1016/s0004-3702(02)00359-4 / rank | |||
Normal rank | |||
Property / OpenAlex ID | |||
Property / OpenAlex ID: W2116964115 / rank | |||
Normal rank |
Latest revision as of 12:10, 30 July 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Converting numerical classification into text classification |
scientific article |
Statements
Converting numerical classification into text classification (English)
0 references
22 January 2003
0 references
Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic---in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features.
0 references
machine learning
0 references
text classification
0 references
information retrieval
0 references