Neural Language Models for Nineteenth-Century English (dataset; language model zoo) (Q6696958)

Dataset published at Zenodo repository.

Language	Label	Description	Also known as
default for all languages	No label defined
English	Neural Language Models for Nineteenth-Century English (dataset; language model zoo)	Dataset published at Zenodo repository.

Statements

instance of

data set

0 references

description

This dataset contains four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the two static models, and four instances considering different time slices for BERT. Github repository: https://github.com/Living-with-machines/histLM

0 references

publication date

23 May 2021

0 references

author

Seyed Kasra Hosseini Zad

0 references

Kaspar Beelen

0 references

Giovanni Colavizza

0 references

Mariona Coll Ardanuy