piecemaker

From MaRDI portal
Revision as of 18:56, 12 March 2024 by Import240312060351 (talk | contribs) (Created automatically from import240312060351)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Software:111109



CRANpiecemakerMaRDI QIDQ111109

Tools for Preparing Text for Tokenizers

Jon Harmon, Jonathan Bratt

Last update: 2 June 2023

Copyright license: Apache License

Software version identifier: 1.0.1, 1.0.0, 1.0.2

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.




Related Items (2)


This page was built for software: piecemaker