Index Analyzed and not Analyzed?

问题

I am using Lucene.net for indexing and searching in my application, I would like to offer NORMAL And Regular Expression Search to user but for normal search I need to index my document In Analyzed way and for regular expression I need to do it by Not Analyzed way, And I cant index same document twice to support both search type...help me Pravin thokal

回答1:

I highly recommend you index the document twice: first as an analyzed field and second as a non-analyzed field. Redundancy is not a bad thing with Lucene. Lucene uses an inverted index, so when the index grows it is usually only by pointers which are typically low cost storage-wise. (I am oversimplifying here. There are other factors to consider, like how many unique terms there are and what kind of analysis you're performing.)

Indexing the text only once will lead to much slower search performance. Why? You'll have to store the non-analyzed text, which means your "normal searches" will have to perform analysis at search-time. (Why store the non-analyzed text? If you store the analyzed text, you won't be able to un-analyze it for your regular expression searches.)

Depending on your scenario there may be a middle ground for you to explore. For example, perhaps your regular expression searches can tolerate a little bit of analysis (such as case insensitivity), and perhaps your normal searches don't require that much analysis (such as preservation of noise words).

来源：https://stackoverflow.com/questions/23932667/index-analyzed-and-not-analyzed

标签

indexing

lucene.net