Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)

From MaRDI portal
Dataset:6704400



DOI10.5281/zenodo.7181682Zenodo7181682MaRDI QIDQ6704400FDOQ6704400

Dataset published at Zenodo repository.

Barbara McGillivray, Nilo Pedrazzini

Publication date: 10 October 2022

Copyright license: Creative Commons Attribution 4.0 International



Word vectors related to the paperMachines in the media: semantic change in the lexiconof mechanization in 19th-century British newspapersby Nilo Pedrazzini and Barbara McGillivray (2022). The embeddings were trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and the following parameters: sg = True min_count = 1 window = 3 vector_size = 200 epochs = 5 The embeddingsare divided into periods of ten years each, with the vectors from each decade aligned to the ones from the most recent decade (1910s) using Orthogonal Procrustes. See related GitHub repository for the full documentation:https://github.com/Living-with-machines/DiachronicEmb-BigHistData Project webpage (Living with Machines):https://livingwithmachines.ac.uk/







This page was built for dataset: Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)