piecemaker
From MaRDI portal
Software:111109
CRANpiecemakerMaRDI QIDQ111109
Tools for Preparing Text for Tokenizers
Last update: 2 June 2023
Copyright license: Apache License
Software version identifier: 1.0.1, 1.0.0, 1.0.2
Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.