textTinyR

From MaRDI portal
Software:31261



swMATH19434CRANtextTinyRMaRDI QIDQ31261FDOQ31261

Text Processing for Small or Big Data Files

Lampros Mouselimis

Last update: 4 December 2023

Copyright license: GNU General Public License, version 3.0

Software version identifier: 1.1.7, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.8

Source code repository: https://github.com/cran/textTinyR

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.




Cited In (1)


This page was built for software: textTinyR