Cambridge Encyclopedia :: Cambridge Encyclopedia Vol. 36

information retrieval - Performance measures, Open source information retrieval systems, Other retrieval tools, Major Information retrieval research groups

The act of tracing information contained in databases. Applicable in principle to any search for information, the term has been associated since the 1960s with the online technique of scanning and interrogating large computer files for specific data. This may take the form of bibliographic references, full-length documents, or constantly updated information (eg share prices). The use of computers makes the process not only quick and relatively cheap, but also thorough and reliable.

Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. There is a common confusion, however, between data retrieval, document retrieval, information retrieval, and text retrieval, and each of these has its own bodies of literature, theory, praxis and technologies.

In 1992 the US Department of Defense, along with the National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program. The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for such a huge evaluation of text retrieval methodologies.

Performance measures

There are various ways to measure how well the retrieved information matches the intended information: The formulas for precision, recall and fall-out are translated from the german Wikipedia-article "Recall und Precision".

Precision

The proportion of retrieved and relevant documents to all the documents retrieved:

In binary classification, precision is analogous to positive predictive value.

Note that the meaning and usage of "precision" in the field of Information Retrieval differs from the definition of accuracy and precision within other branches of science and technology.

Recall

The proportion of relevant documents that are retrieved, out of all relevant documents available:

In binary classification, recall is called sensitivity.

F-measure

The weighted harmonic mean of precision and recall, the traditional F-measure or balanced F-score is:

This is also known as the F1 measure, because recall and precision are evenly weighted.

The general formula for non-negative real α is:

Two other commonly used F measures are the F2 measure, which weights recall twice as much as precision, and the F0.5 measure, which weights precision twice as much as recall.

Mean average precision

Over a set of queries, find the mean of the average precisions, where Average Precision is the average of the precision after each relevant document is retrieved.

Where r is the rank, N the number retrieved, rel() a binary function on the relevance of a given rank, and P() precision at a given cut-off rank:

This method emphasizes returning more relevant documents earlier. Common models are: Standard Boolean model Extended Boolean model fuzzy retrieval Algebraic Models represent documents and queries usually as vectors, matrices or tuples. Vector space model Generalized vector space model Topic-based vector space model (literature: , ) Extended Boolean model Enhanced topic-based vector space model (literature: , ) Latent semantic indexing aka latent semantic analysis Probabilistic Models treat the process of document retrieval as a multistage random experiment. (For example a human or sophisticated algorithms.)

Open source information retrieval systems

ht://dig Open source web crawling software Glimpse and Webglimpse advanced site search software Egothor high-performance, full-featured text search engine written entirely in Java Lemur Language Modelling IR Toolkit Lucene Apache Jakarta project MG full-text retrieval system Now maintained by the Greenstone Digital Library Software Project Smart Early IR engine from Cornell University Terrier TERabyte RetrIEveR, Information Retrieval Platform, written in Java Wumpus multi-user information retrieval system Xapian Open source IR platform based on Muscat Zebra GPL structured text/XML/MARC boolean search IR engine supporting Z39.50 and Web Services Zettair, compact and fast search engine written in C, able to handle large amounts of text

Other retrieval tools

ASPseek iHOP Information retrieval system for the biomedical domain EBIMed Information retrieval (and extraction) system over Medline DataparkSearch, search engine written in C Fluid Dynamics Search Engine (FDSE) An open source search engine written in Perl, freeware and shareware versions are available GalaTex XQuery Full-Text Search (XML query text search) Information Storage and Retrieval Using Mumps (Online GPL Text) mnoGoSearch the renowned SQL search engine Sphinx Free open-source SQL full-text search engine

Major Information retrieval research groups

Center for Intelligent Information Retrieval at UMASS Information Retrieval at the Language Technologies Institute, Carnegie Mellon University Information Retrieval at Microsoft Research Cambridge Glasgow Information Retrieval Group CIR Centre for Information Retrieval Centre for Interactive Systems Research at City University, London IIT Information Retrieval Lab Information Retrieval Group at Université de Neuchâtel PSU Intelligent Systems Research Laboratory Information and Language Processing Systems at the University of Amsterdam Information Retrieval Laboratory, Harbin Institute of Technology (mainly in Chinese)

Major figures in information retrieval

Gerard Salton Hans Peter Luhn W. Robertson Abraham Bookstein Stephen P Harter David Blair

Other figures associated to information retrieval

Vannevar Bush Paul DeMaine Douglas Engelbart Eugene Garfield Robert R. "About the future of automatic information retrieval" 1988 - Karen Sparck Jones, University of Cambridge  "On theoretical argument in information retrieval" 2003 - W. "Information retrieval and computer science: an evolving relationship" 2006 - C.

User Comments Add a comment…

information technology - Industry organizations, Topics [next] [back] information processing