next up previous
Next: Compositions of Technologies Up: Term Selection using Traditional Previous: The Swets Model of

The wpq formula

Robertson uses the weighting function

to arrive at the weight tex2html_wrap_inline2155 for term tex2html_wrap_inline2149; the sum of this weight for all terms can be used to rank documents and is the basis for Equation (4).

Using the Swets model, Robertson considers the effect of adding a term t which has weight . Robertson shows that the means of the two distributions in the Swets model become:

and subsequently the new distance between them becomes:

Thus, Robertson shows that inclusion of term t should increase the effectiveness of the distribution by

This equation is commonly called the wpq formula.

Robertson notes that this selection function is actually independent of the weighting function, and thus alternative weighting functions could be used instead of the one he originally picked.

Robertson points out that this function does rest on the assumption that the distributions of ranking values are normal. Since this is unlikely, he concedes that the above formula is perhaps more indicative than concrete. In a 1995 study,  Efthimiadis evaluates a selection of formulas, including Robertson's wpq formula, f4 and f4 point 5, and EMIM, among others [12]. Efthimiadis concludes that the wpq and EMIM formulas, along with Porter's [32] and r-lohi [11], outperformed all others and are all equally useful.



Erik Selberg
Wed Aug 6 12:24:17 PDT 1997