Next: Vector Model of IR
Up: Formal Background
Previous: Formal Background
For the purposes of this paper, I will use the following
definitions:
- is a set of N documents in the
collection. d will refer to an arbitrary document
.
- is a set of m terms indexed in
D. In this paper
will generally be a single word, but
richer definitions may be used without impacting the underlying
theory. t will refer to an arbitrary term
.
- is the weight of term
. Weight is simply a
numeric quantity that indicates the importance of a term.
- is an individual document, represented by the vector of
length m

where
is the weight of term
in document
. In the
simplest case, each
is a term indexed from one or more
documents, and
is either 1 or 0, depending if the term
is present or absent from document
.
- is a set of documents
restricted to those documents that contain the term t. Note that
represents the Document Frequency of term t.
- Q
- is a Query containing terms
. The specific
structure of Q is model dependent and will be further defined.
- |X|
- is the size of set X. |V| will also be used to denote
the length of a vector V.
- is the number of times term
appears in document
. This is the Term Frequency of
.
Erik Selberg
Wed Aug 6 12:24:17 PDT 1997