Introduction to information retrieval and quantum mechanics (Q497293)

The book indicates the relation between information retrieval (IR) and quantum mechanics. Information retrieval is the subfield of computer science that deals with the retrieval of text documents that are described by a set of terms (words). The retrieved documents have to be similar to the set of the described words and should be relevant to the user. The sets of relevant or non-relevant documents have to be specified or learned during the information retrieval task. A document may be relevant to a user or not. Documents can be described by a set of terms or by a vector. Each dimension of the vector indicates the frequency of the occurrence of the term in the document multiplied with the information of the word in relation to the whole document set. The information is measured by the logarithm of the probability of the occurrence of the term in the document set. The similarity between documents is measured as the cosine of the angle of the vectors that describe them. The angle is equivalent to the Euclidean distance between the two normalised vectors. This representation is somehow related to the representation of a quantum state by normalised vectors representing the corresponding amplitudes. The book uses the mathematical framework that is used in quantum mechanics to describe information retrieval. It is not about quantum phenomena in information retrieval. In Chapter one, the main approaches to information retrieval are introduced. Relevance feedback is described as a way of user interaction with the information retrieval system. Then two different main approaches are described, Boolean logic and the vector space model. Document queries and Boolean logic are based on sets and operators. The Boolean logic is extend through weight functions. Weight functions are represented by sets that are identified with the vector representation. Motivated by this fact, the Dirac notation for a normalised complex vector is introduced together with the scalar product and density matrices in the Hilbert space. Projectors in the Dirac notation are identified with the classical weight functions as used in information retrieval. Then, the vector space model as used in information retrieval is described using the Dirac notation and its relation to Boolean logic is highlighted. After that, probabilistic models are introduced together with the relevance feedback and the probability ranking principle and the related language models. The chapter closes with a short introduction to machine learning in information retrieval and suggested readings. Chapter two introduces the principles of quantum mechanics, namely observables, vector representation, tensor product and projectors. It is shown that matrix multiplication is non-commutative as described by the commutator. Non-commutative projectors are incompatible, the incompatibility plays an important role in information retrieval and cognitive science models. Quantum bits and the principle of superposition are described, followed by density matrices and the trace. The double-slit experiment is explained together with the inference term, followed by the definition of Entanglement and the relation of mixed and pure states. Pure states can be described by vectors, mixed states cannot be described by vectors, they correspond to mixture of possible states (vectors) and are represented by density matrices. Chapter three introduces the relations of quantum mechanics and information retrieval. The quantum formalism is related to the vector space model. Gleason's Theorem integrates three main modelling approaches in information retrieval, Boolean, vector space and the probabilistic approach. The theorem proves that the Born rule for the probability of obtaining specific results for a given measurement follows naturally from the structure formed by the lattice of events in a real or complex Hilbert space. It relates the geometry of subspaces, the probability represented by the trace rule and the projectors. Gleason's Theorem and its relation to information retrieval is presented, the inner product relates to the vector space model, the projector to the Boolean logic and the vector representation induces the probabilistic model. The incompatibility and the relevance and the relation to information retrieval are described related to the effects that result from the order in which documents are presented. The section entanglement and the correlation deal with the idea that correlation corresponds to the process of clustering of documents. Later, a possible implementation of logic induced by subspaces for information retrieval based on kinds is introduced. A kind is the twofold representation of a concept. After that, the concept of combination is introduced, as for example how to use superposition and entanglement to represent concepts and how to extend the language model. In the following subsection the relation of word ambiguity and superposition is investigated. This is followed by the description of semantic spaces and the relation to the concept of entanglement. Contextual search is described by modelling subspaces. The principles of quantum probability ranking with the interference term is described followed by user interaction by relevance feedback and its relation to the concepts of superposition and entanglement. The chapter closes with some evaluation studies of contextual search models, semantic spaces and the quantum probability ranking principle. The fourth chapter indicates some suggestions for future research. The book closes with an appendix on quantum mechanics and an important question, namely, why to use complex numbers in information retrieval. Usually the real numbers are sufficient for conventional information retrieval models. It is suggested that complex numbers may result from combination of real numbers or to defined transformations that map a basis state into superposition. The book represents a compact introduction to the new field of quantum-like representation of information retrieval.

0 references

reviewed by

Andreas Wichert

0 references

zbMATH Keywords

information retrieval

0 references

quantum formalism

0 references

Dirac notation

0 references

quantum algorithms