Neural Language Models for Nineteenth-Century English (dataset; language model zoo)
DOI10.5281/zenodo.4782245Zenodo4782245MaRDI QIDQ6696958FDOQ6696958
Dataset published at Zenodo repository.
Mariona Coll Ardanuy, Seyed Kasra Hosseini Zad, Kaspar Beelen, Giovanni Colavizza
Publication date: 23 May 2021
Copyright license: Creative Commons Attribution 4.0 International
This dataset contains four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the two static models, and four instances considering different time slices for BERT. Github repository: https://github.com/Living-with-machines/histLM
This page was built for dataset: Neural Language Models for Nineteenth-Century English (dataset; language model zoo)