textTinyR (Q31261)

From MaRDI portal
Text Processing for Small or Big Data Files
Language Label Description Also known as
English
textTinyR
Text Processing for Small or Big Data Files

    Statements

    0 references
    1.1.7
    26 October 2021
    0 references
    1.0.0
    7 January 2017
    0 references
    1.0.1
    11 January 2017
    0 references
    1.0.2
    20 January 2017
    0 references
    1.0.3
    29 January 2017
    0 references
    1.0.4
    28 March 2017
    0 references
    1.0.5
    1 April 2017
    0 references
    1.0.6
    3 May 2017
    0 references
    1.0.7
    5 June 2017
    0 references
    1.0.8
    31 October 2017
    0 references
    1.0.9
    16 January 2018
    0 references
    1.1.0
    3 April 2018
    0 references
    1.1.1
    17 May 2018
    0 references
    1.1.2
    25 July 2018
    0 references
    1.1.3
    14 April 2019
    0 references
    1.1.4
    5 May 2021
    0 references
    1.1.5
    13 October 2021
    0 references
    1.1.6
    21 October 2021
    0 references
    1.1.8
    4 December 2023
    0 references
    0 references
    4 December 2023
    0 references
    It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers