Language modeling with reduced densities
From MaRDI portal
Publication:5087658
DOI10.32408/COMPOSITIONALITY-4-1zbMATH Open1489.91211arXiv2007.03834OpenAlexW3162871608MaRDI QIDQ5087658FDOQ5087658
Authors: Tai-Danae Bradley, Yiannis Vlassopoulos
Publication date: 1 July 2022
Published in: Compositionality (Search for Journal in Brave)
Abstract: This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also - and quite crucially - because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment.
Full work available at URL: https://arxiv.org/abs/2007.03834
Recommendations
- An enriched category theory of language: from syntax to semantics
- Reasoning about meaning in natural language with compact closed categories and Frobenius algebras
- Distributional sentence entailment using density matrices
- Ambiguity and incomplete information in categorical models of language
- Open system categorical quantum semantics in natural language processing
Cited In (1)
This page was built for publication: Language modeling with reduced densities
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5087658)