Combining a probabilistic term weighting function and a probabilistic term selection function in one system is obviously possible, as evidenced by the Haines and Croft system. In fact, most modern systems now use both a weighting and selection formula; even the SMART system, known for its use of massive term expansion, is now selecting terms rather than adding all available candidates [6].
While most published results are not directly comparable, they do give a fair indication of the range of results. Haines and Croft are able to obtain a 99.8% increase on average in the CACM abstract collection, and a 30.5% increase on average in the full-text WEST collection. In a followup study to the one presented in this paper, Harman compares term selection, term weighting, and the combination of selection and weighting [18]. She shows that, using the selection formula defined by Equation (11) and the weighting formula defined by Equation (10) that the average improvement in precision by reweighting is 23.9%, the improvement using term selection is 91.0%, and the combined improvement is 112.2%. For this study, Harman again uses the Cranfield collection of 1400 aeronautical abstracts with 225 queries.
Thus, as a baseline indication, a reasonable system should be able to demonstrate at least a 100% performance increase, as measured by the average improvement in precision, using both term selection and term weighting for collections containing only abstracts. For full text collections, a reasonable system should be able to boast at least a 25% increase in performance.