tokenizers.bpe
From MaRDI portal
Software:83103
CRANtokenizers.bpeMaRDI QIDQ83103
Byte Pair Encoding Text Tokenization
Last update: 15 September 2023
Copyright license: Mozilla Public License, version 2.0
Software version identifier: 0.1.1, 0.1.0, 0.1.2, 0.1.3
Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
This page was built for software: tokenizers.bpe