Scoring using the Vector Space Model

Previously we discussed tf-idf as a way to calculate how relevant a search term is given a set of indexed documents. When having multiple terms, we used overlap score measure consisting in the sum of the tf-idf for each term in the given input. A more general and flexible way of scoring multi-term searches is using the vector space model.

Term Frequency - Inverse Document Frequency 101

Let us expose here a basic and beautiful Information Retrieval concept such as tf-idf. In order to do so, we will use Python to define a basic in-memory “search engine” that will allow us to add documents and search for them. The search results will contain the relevant documents together with the tf-idf value.