Information extraction: Algorithms and prospects in a retrieval context. (Q2498734)

The best way how to show the range of themes studied in the book is to quote the headings of the chapters: Information extraction and information technology, Information extraction from an historical perspective, The symbolic techniques, Pattern recognition, Supervised classification, Unsupervised classification aids, Integration of information extraction in retrieval models, Evaluation of information extraction technologies, Case studies, The future of information extraction in a retrieval context. After definition and explanation of the basic concepts and description of the historical development of the area, the past and current most successful algorithms and their application in a variety of domains are discussed. Especially important is the explanation of statistical and machine learning algorithms for information detection and classification and integration of their results in probabilistic retrieval models. Together with the explanatory chapters, the summarizing ones are of great value. Chapter 8 studies the possible metrics for performance, computational complexity, linguistic coverage, domain coverage, extensibility, portability, time for training. Chapter 9 presents important case studies -- information extraction from news texts, from biomedical texts, from business texts, from legal texts, from informal texts (speech transcription) and intelligence gathering. Close attention is paid to biomedical texts that the author considers as the most challenging ones at present for the progress of information extraction. Chapter 10 summarizes the results and their future application to information retrieval and problem solving. The process of information extraction is seen as an inversion of expressing information in text. The genericity versus domain specificity of extraction is considered, as well as the neccessity of supplying the computer with man-made metadata (ontologies). The paraphrasing is compared to information extraction, and ideas of linking information across sentences and across documents are presented. Algorithmic challenges presented by the proposed new methods are considered in the last paragraphs. The choice of features, cascaded model, boundaries of information units, sharable and implicit knowledge, information synthesis (summarization, suppression, disambiguation, inferences) and cross-media synthesis are considered. The book is based on the results of a project on generic technology for information extraction from texts (2000--2004) and on a graduate course on text based information retrieval, both performed at the Katholieke Universiteit Leuven, Belgium. Because its broad coverage and clear and sound explanation it is suitable and valuable both for researchers and for students.

0 references

zbMATH Keywords

information extraction

0 references

information retrieval

0 references

algorithms and methods

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL