Robertson and Sparck Jones describe a probabilistic system as a system
that determines the weights of terms in document and query vectors
probabilisticly [24]. Using log likelihoods, they
derive the following expression for the ranking of a document based on
sum of probabilisticly-derived weights:
![]()
where
is the probability that term
is in a relevant
document and
is the probability that
is in a non-relevant
document. m is the number of terms in the collection. The question
now comes to determining how to calculate
and
.
For a given term
and a query Q, Robertson and Sparck Jones
define:
Using this table,
and
, the probabilities of a document
being relevant when a term is or is not present, are defined by
Robertson and Sparck Jones as

and by substituting back into Equation (4) they arrive at
![]()
also referred to as the f4 formula. In order to avoid
undefined weights when
is 0, a common case in operational
systems, a factor of 0.5 is added to each term, resulting in the
f4 point 5 formula:

N and
are easily obtainable. However, since
is unknown
to the system, it is necessary to estimate R and
for the
initial query. There are a variety of estimation methods
available [21, ] for initial queries,
although the common technique is to use the f4 point 5 or
similar [36] formula and set R and
to 0
resulting in an initial uniform weight for all terms in a query.
Relevance Feedback is implemented in this system by successively
updating R and
after each iteration. The theory is that after
enough iterations, the values for
and
will converge upon
the true weights, and thus the ranking formula defined by Equation
(4) will produce the full set of relevant documents
.