# Bag of Words

## DBoW Library

The library is composed of two main classes: `Vocabulary` and `Database`. The former is trained offline with numerous images, whereas the latter can be established/expanded online. Both structures can be saved in binary or text format.

#### Weighting

Words in the vocabulary and in bag-of-words vectors are weighted. There are four weighting measures implemented to set a word weight *wi*:

* Term frequency (*tf*): [![w\_i = \frac{n\_{id}}{n\_d}](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/tf.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/tf.gif), [![n\_{id}](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/nid.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/nid.gif): number of occurrences of word *i* in document *d*, [![n\_d](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/nd.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/nd.gif): number of words in document *d*.
* Inverse document frequency (*idf*): [![w\_i = log(\frac{N}{N\_i})](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/idf.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/idf.gif), [![N](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/N.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/N.gif): number of documents, [![N\_i](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/Ni.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/Ni.gif): number of documents containing word *i*.
* Term frequency -- inverse document frequency (*tf-idf*): [![w\_i = \frac{n\_{id}}{n\_d} log(\frac{N}{N\_i}](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/tf-idf.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/tf-idf.gif).
* Binary: [![w\_i = 1 if word i is present; 0 otherwise](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/binary.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/binary.gif)

DBow calculates *N* and *Ni* according to the number of images provided when the vocabulary is created. These values are not changed and are independent of how many entries a `Database` object contains.

#### Scoring&#x20;

A score is calculated when two vectors are compared by means of a `Vocabulary` or when a `Database` is queried. These are the metrics implemented to calculate the score *s* between two vectors *v* and *w* (from now on, *v\** and *w\** denote vectors normalized with the L1-norm):

* Dot product: [![Dot product](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/dot.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/dot.gif)
* L1-norm: [![L1-norm](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/L1.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/L1.gif)
* L2-norm: [![L2-norm](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/L2.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/L2.gif)
* Bhattacharyya coefficient: [![Bhattacharyya coefficient](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/bhat.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/bhat.gif)
* χ² (chi-square) distance: [![Chi square distance](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/chisq_extended.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/chisq_extended.gif)
* KL-divergence: [![KL-divergence](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/kl_extended.gif)](https://raw.githubusercontent.com/dorian3d/dorian3d.github.io/master/other/images/kl_extended.gif)

The default configuration when creating a vocabulary is *tf-idf*, L1-norm.

### References

* [dorian3d/DBow](https://github.com/dorian3d/DBow)
* [dorian3d/DBoW2](https://github.com/dorian3d/DBoW2)
* [rmsalinas/DBow3](https://github.com/rmsalinas/DBow3)
* [rmsalinas/fbow](https://github.com/rmsalinas/fbow)
