next up previous
Next: Combining Redundant Techniques Up: Compositions of Technologies Previous: Combining Weighting and Selection

Best Methods

Having surveyed the primary techniques and models, the obvious question of ``Which is the best?'' arises. Not surprisingly, others have investigated this question.

A 1990 study by Salton and Buckley compares a variety of vector and probabilistic models, using query expansion of all terms and a select number of relevant terms [43]. They determine that Ide Dec-Hi augmented with a term selection formula is the best performer of the bunch, and that, in general, vector-based systems outperform probabilistic.

Naturally, this brings several counterclaims and other studies. Partially in response to this, TREC, the Text REtrieval Conference, was created. The idea behind TREC is that all interested parties would be given some test collections. They would run their IR systems on those collections, and submit their results. Then, everyone's results would be evaluated using the exact same metrics, allowing for comparison between systems as a whole. Recently, SMART has been outperforming other systems at TREC, although it hasn't been winning by large margins [19].

Based on the Salton and Buckley study, along with recent TREC studies, it would seem that the best way to implement a Relevance Feedback system would be to use Ide Dec-hi with an appropriate term selection function, such as Equation (11). It is also much easier to implement, involving fewer calculations and manipulations than either the standard probabilistic or inference network models. However, while a vector-based model may currently yield the best performance, researchers have been unable to make large performance improvements to the top vector-based systems in recent years [28]. This implies that for a research system, the probabilistic model may still have some undiscovered techniques for greater performance.



Erik Selberg
Wed Aug 6 12:24:17 PDT 1997