tokenizers.bpe (Q83103)

From MaRDI portal
Byte Pair Encoding Text Tokenization
Language Label Description Also known as
English
tokenizers.bpe
Byte Pair Encoding Text Tokenization

    Statements

    0 references
    0.1.1
    6 January 2023
    0 references
    0.1.0
    2 August 2019
    0 references
    0.1.2
    14 September 2023
    0 references
    0.1.3
    15 September 2023
    0 references
    0 references
    15 September 2023
    0 references
    Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
    0 references
    0 references
    0 references

    Identifiers