tokenizers.bpe (Q83103)

From MaRDI portal





Byte Pair Encoding Text Tokenization
Language Label Description Also known as
default for all languages
No label defined
    English
    tokenizers.bpe
    Byte Pair Encoding Text Tokenization

      Statements

      0 references
      0.1.1
      6 January 2023
      0 references
      0.1.0
      2 August 2019
      0 references
      0.1.2
      14 September 2023
      0 references
      0.1.3
      15 September 2023
      0 references
      0 references
      15 September 2023
      0 references
      Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
      0 references
      0 references
      0 references

      Identifiers