cleanNLP

From MaRDI portal
Software:31292



swMATH19465CRANcleanNLPMaRDI QIDQ31292

A Tidy Data Model for Natural Language Processing

Taylor B. Arnold

Last update: 16 November 2023

Software version identifier: 3.0.4, 0.24, 1.5.2, 1.9.0, 1.10.0, 2.0.3, 2.3.0, 3.0.0, 3.0.2, 3.0.3, 3.0.7

Source code repository: https://github.com/cran/cleanNLP

Copyright license: GNU Library General Public License, version 2.0

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or two Python back ends with 'spaCy' <https://spacy.io> or 'CoreNLP' <https://stanfordnlp.github.io/CoreNLP/>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.