next up previous
Next: Vector Model of IR Up: Formal Background Previous: Formal Background

Definitions

For the purposes of this paper, I will use the following definitions:

tex2html_wrap_inline2135
is a set of N documents in the collection. d will refer to an arbitrary document tex2html_wrap_inline2141.
tex2html_wrap_inline2143
is a set of m terms indexed in D. In this paper tex2html_wrap_inline2149 will generally be a single word, but richer definitions may be used without impacting the underlying theory. t will refer to an arbitrary term tex2html_wrap_inline2153.
tex2html_wrap_inline2155
is the weight of term tex2html_wrap_inline2149. Weight is simply a numeric quantity that indicates the importance of a term.
tex2html_wrap_inline2159
is an individual document, represented by the vector of length m
displaymath2133
where tex2html_wrap_inline2163 is the weight of term tex2html_wrap_inline2165 in document tex2html_wrap_inline2159. In the simplest case, each tex2html_wrap_inline2165 is a term indexed from one or more documents, and tex2html_wrap_inline2163 is either 1 or 0, depending if the term tex2html_wrap_inline2165 is present or absent from document tex2html_wrap_inline2159.
tex2html_wrap_inline2181
is a set of documents restricted to those documents that contain the term t. Note that tex2html_wrap_inline2185 represents the Document Frequency of term t.
Q
is a Query containing terms tex2html_wrap_inline2153. The specific structure of Q is model dependent and will be further defined.
|X|
is the size of set X. |V| will also be used to denote the length of a vector V.
tex2html_wrap_inline2203
is the number of times term tex2html_wrap_inline2205 appears in document tex2html_wrap_inline2159. This is the Term Frequency of tex2html_wrap_inline2205.



Erik Selberg
Wed Aug 6 12:24:17 PDT 1997