next up previous
Next: Users queries are ``well-formed'' Up: Unstated Assumptions Previous: Unstated Assumptions

Documents and abstracts are ``well-formed''

Most IR systems, and subsequentially RF systems, are designed for and evaluated with documents or abstracts that make a conscious effort to communicate something. In essence, all text that is indexed by an IR system is assumed relevant to some query.

As IR systems are deployed in new settings, the assumption regarding the quality of documents is being tested. Particularly with the World Wide Web, documents are often nothing more than a collection of single words and hyperlinks to other sites. Another difficulty with the Web is that authors, in attempts to attract people, often insert misleading words or phrases in efforts to appear relevant to searches. Thus, document ranking algorithms need to be re-evaluated in the face of very little information as well as intentionally misleading information.



Erik Selberg
Wed Aug 6 12:24:17 PDT 1997