The field of Information Retrieval (IR) has a rich history of developing technology designed to help users find relevant documents. One of the more important issues in IR literature is handling ``novice'' users. Novice users are characterized as either completely inexperienced with the IR system at hand or are just unsure of the contents of a particular collection. A technology that has often been exploited to help novice users is Relevance Feedback (RF).
In a typical scenario, a user will engage in a cyclic pattern while looking for information. The user will first formulate and submit the query, and subsequently analyze the documents returned. Then, if the user is not satisfied, he or she will modify the query and begin the cycle again. Relevance Feedback is a technique by which the system is able to reformulate the query by taking relevance information on particular documents or terms from the user. For example, after the user has submitted the initial query, the user may decide that the first, third, and eighth documents are relevant. A RF system may automatically reformulate and resubmit the query, thus simplifying the potentially difficult task of selecting which terms and which ordering to use for the subsequent query.
Nowhere is the need to properly handle novice users more pronounced
than with the World Wide Web. The
average user on the Web is certainly not a professional searcher, and
often assumes that ``everything'' is on the Web. When users search
the Web, currently composed of well over 50 million
documents
, they often find
that their queries can match several thousand documents. It would seem
obvious that some type of Relevance Feedback could be used to help
users refine their query so as to return documents that are indeed
germane. Curiously, this is not yet the case.
This paper explores why Relevance Feedback is conspicuously absent from the Web. It begins by presenting a high level overview of information retrieval followed by a formal mathematical background. It then surveys four Relevance Feedback techniques, and examines how those techniques could be combined. Finally, it concludes with some speculation as to why Relevance Feedback has not yet been widely adopted on the Web, and what could be done to add it.