tokenizers.bpe

From MaRDI portal
Software:83103



CRANtokenizers.bpeMaRDI QIDQ83103

Byte Pair Encoding Text Tokenization

Jan Wijffels

Last update: 15 September 2023

Copyright license: Mozilla Public License, version 2.0

Software version identifier: 0.1.1, 0.1.0, 0.1.2, 0.1.3

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.





This page was built for software: tokenizers.bpe