Robertson uses the weighting function
to arrive at the weight
for term
; the sum of this weight
for all terms can be used to rank documents and is the basis for
Equation (4).
Using the Swets model, Robertson considers the effect of adding
a term t which has weight .
Robertson shows that the means of the two distributions in the Swets
model become:
and subsequently the new distance between them becomes:
Thus, Robertson shows that inclusion of term t should increase the
effectiveness of the distribution by
This equation is commonly called the wpq formula.
Robertson notes that this selection function is actually independent of the weighting function, and thus alternative weighting functions could be used instead of the one he originally picked.
Robertson points out that this function does rest on the assumption that the distributions of ranking values are normal. Since this is unlikely, he concedes that the above formula is perhaps more indicative than concrete. In a 1995 study, Efthimiadis evaluates a selection of formulas, including Robertson's wpq formula, f4 and f4 point 5, and EMIM, among others [12]. Efthimiadis concludes that the wpq and EMIM formulas, along with Porter's [32] and r-lohi [11], outperformed all others and are all equally useful.