Rocchio's technique of feedback is based on the approximation of an optimal query. In order to describe his technique, it is first necessary to recite his definition of an optimal query.
Given a document collection D and a query Q, Rocchio assumes the
existence of
, which is the relevant set of documents
in D to Q. He further defines an ideal request as a query
that ranks all elements of
above all other elements in D.
However, he is quick to point out that the ideal request is unlikely
to exist for all queries. Thus, he defines an optimal request
query
, which is the query that maximizes the similarity of the
relevant documents and minimizes the similarity of the irrelevant
documents. In mathematical terms, Rocchio defines C as
![]()
is the vector that corresponds to Q when C is maximal.
To find this,
Rocchio uses the cosine similarity function defined by
Equation (2) to determine the correlation between a query
and a given document. Substituting for r(Q, d) he derives
![]()
Rocchio observes that C is maximized when Q = kA for any arbitrary
scalar k. Thus,

While the formula for
shows that there exists an optimal
query for a given set of relevant documents
, this formula
does not help retrieve those documents, as knowledge of
obviates
the need for retrieval and making
in the first place.
Rocchio then defines his Relevance Feedback technique as an iterative
process whereby the user begins with query
that returns a set of
documents
, and continues to reformulate the query
until the user is satisfied that
is
. The question
that now arises is what is the process by which
becomes
?
Rocchio describes the abstract mathematical model as
![]()
where R is the set of documents the user has judged relevant, and
S the set of irrelevant documents. Rocchio suggests that f should
be
![]()
or alternatively

where Equation (8) is the form commonly seen in
subsequent literature.
Rocchio made the initial evaluation of his method using 17 requests on the SMART system. He made some modifications to the technique; in particular, since SMART was unable to handle negative weights, any negative weight was set to 0. His results were very promising, showing approximately a 10% increase in precision for every level of recall when using Relevance Feedback, although due to the extremely limited sample size the results could not be taken as statisticly significant.
Ide conducted some experimental work on Rocchio's formula, as well as
some modifications, again using the SMART system [22]. He
verified that Rocchio's system did in fact result in an increase in
performance. He also suggested a modification of Rocchio's formula
which is often seen, commonly called Ide Dec-Hi:

where
is the first non-relevant document retrieved. Ide Dec-Hi
is commonly thought to be the best pure vector-based formula
available [43].
The result of Rocchio's work is an elegant and simple way of incorporating relevance information to refine queries. The user interface necessary requires only that a user mark documents as either relevant or irrelevant, and the system infers exactly what components of those documents make it relevant or irrelevant. It is also computationally feasible, being linear in the number of documents retrieved and terms per document. All together, this technique combines a strong mathematical foundation with a useful practical method.