next up previous
Next: Probabilistic Model of IR Up: Formal Background Previous: Definitions

Vector Model of IR

In the vector model, a query Q is represented as an attribute vector in a similar fashion to documents:
displaymath2211
where tex2html_wrap_inline2155 represents the weight of attribute tex2html_wrap_inline2149 in Q.

In order to determine which documents satisfy the query, some similarity measure, or document ranking function, r is needed. Common values for r are the simple dot product
eqnarray86
the cosine function
 eqnarray88
and the similarity function
eqnarray93

Of note is that when the entries of Q and d are restricted to either 0 or 1, the dot product returns the number of terms contained in both Q and d. The denominators of r'' and r''' are two different methods to normalize for document length.

In order to satisfy a given query Q, r(Q, d) is computed for all tex2html_wrap_inline2245, and those documents with sufficiently high values of r are returned, usually in sorted order of decreasing value or in some kind of clustered display [54].



Erik Selberg
Wed Aug 6 12:24:17 PDT 1997