Skip to main content

Posts

Showing posts from March 30, 2012

Stemmers and Lemmatizers in Search Algorithms

Searching / Search Engines are an essential part of every content specific software or web product. The essential part of a good search engine might involve indexing of the content to be searched so that the content can be stored like a hash table to access the content at O(1) complexity.

How do we achieve this indexing? There are two conventional ways to achieve this, which are Forward and Reverse Indexing. While Forward Indexing involves storing the indexes on a sample to token hash table format, Reverse Indexing stores the indexes as token to sample format.

For example if the sentence to be searched is - 'what is love?', the tokens would be 'what', 'is' and 'love'. Now the entry as per forward indexing would store the key as 'what is love?' and the value as a list of tuples 'what', 'is' and 'love', while as per reverse indexing, we would append the keys 'what', 'is' and 'love' withe the sample …