Posts

Showing posts from March 30, 2012

Stemmers and Lemmatizers in Search Algorithms

Searching / Search Engines are an essential part of every content specific software or web product. The essential part of a good search engine might involve indexing of the content to be searched so that the content can be stored like a hash table to access the content at O(1) complexity. How do we achieve this indexing? There are two conventional ways to achieve this, which are Forward and Reverse Indexing . While Forward Indexing involves storing the indexes on a sample to token hash table format, Reverse Indexing stores the indexes as token to sample format. For example if the sentence to be searched is - 'what is love?', the tokens would be 'what', 'is' and 'love'. Now the entry as per forward indexing would store the key as 'what is love?' and the value as a list of tuples 'what', 'is' and 'love', while as per reverse indexing , we would append the keys 'what', 'is' and 'love' withe the