next up previous
Next: The wpq formula Up: Term Selection using Traditional Previous: Term Selection using Traditional

The Swets Model of IR System Performance

Assume an IR system uses a document ranking function r(Q, d). The Swets model examines the distribution of the values of r; in particular, it looks at two distributions: the distribution of values when d is the relevant set of documents tex2html_wrap_inline2321, and the distribution when d is in the irrelevant set tex2html_wrap_inline2555. Intuitively, the ranking values when tex2html_wrap_inline2557 should generally be higher than then tex2html_wrap_inline2559. The assumption is that the more separated the distributions are, the better the ranking function. Assume for a moment that these two distributions exist with means of tex2html_wrap_inline2561 and tex2html_wrap_inline2563. Let tex2html_wrap_inline2565 be the distance between the two means.

Figure

 figure478
Figure 2:  The Swets model of ranking distribution

2 shows a graphical representation of the Swets model. Pictured are two distributions of values of a ranking function r, the leftmost distribution corresponding to values of r(Q, d) when d is not relevant, and the rightmost distribution corresponding to values of r(Q, d) when d is relevant. The means of each distribution, tex2html_wrap_inline2563 and tex2html_wrap_inline2561 are highlighted, as is , the distance between the means.

Swets uses as an approximation for the quality of the distributions, with higher values indicating better distributions. Intuitively, larger values of imply that the ranking function r is able to do a better job of differentiating relevant documents from irrelevant documents. While there are some issues in dealing with this model to compare practical systems [5, 41], the Swets model nevertheless offers an attractive theory on which to base formulas for document ranking.



Erik Selberg
Wed Aug 6 12:24:17 PDT 1997