A collection of text embeddings of the arXiv corpus by title and abstract

From MaRDI portal
Dataset:6673194



DOI10.5281/ZENODO.8226457Zenodo8226457MaRDI QIDQ6673194FDOQ6673194

Dataset published at Zenodo repository.

Dennis Shung, Kisung You

Publication date: 8 August 2023



A popular online repository of arXiv is home to numerous preprints in many scientific domains. Other than playing a role of disseminating up-to-date knowledge in pertaining domains, arXiv is an interesting complex system by itself from text analytics point of view. In this repository, we provide a collection of text embedding outputs for (almost) all papers from the arXiv corpus by their titles and abstracts in order to provide multi-faceted characteristics of scientific knowledge.







This page was built for dataset: A collection of text embeddings of the arXiv corpus by title and abstract