High level explanation of Similarity Class for Lucene?
Do you know where I can find a high level explanation of Lucene Similarity Class algorithm. I will like to understand it without having to decipher all the math and terms involved with searching and indexing. Lucene's built-in Similarity is a fairly standard "Inverse Document Frequency" scoring algorithm. The Wikipedia article is brief, but covers the basics. The book Lucene in Action breaks down the Lucene formula in more detail; it doesn't mirror the current Lucene formula perfectly, but all of the main concepts are explained. Primarily, the score varies with number of times that term occurs