Artificial sequences and complexity measures

From MaRDI portal
Publication:4968837

DOI10.1088/1742-5468/2005/04/P04002zbMATH Open1456.94030arXivcond-mat/0403233OpenAlexW3099675966MaRDI QIDQ4968837FDOQ4968837


Authors: Andrea Baronchelli, Emanuele Caglioti, Vittorio Loreto Edit this on Wikidata


Publication date: 9 July 2019

Published in: Journal of Statistical Mechanics: Theory and Experiment (Search for Journal in Brave)

Abstract: In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.


Full work available at URL: https://arxiv.org/abs/cond-mat/0403233




Recommendations




Cites Work


Cited In (3)

Uses Software





This page was built for publication: Artificial sequences and complexity measures

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4968837)