问题
I have a database with a column I wish to index that has comma-delimited names, e.g.,
User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley"
I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this?
- Did I miss a simple option to set the tokenize delimiter?
- Do I have to subclass or write my own class that to roll my own tokenizer?
- Something else? ;)
Or does Lucene.net not support phrases?
Or is it smart enough to handle this use case automatically?
I'm sure I'm not the first person to have to do this. Googling produced no noticeable solutions.
*** EDIT: using my example, I want to store these name phrases in a single field:
Helen Ready
Phil Collins
Brad Paisley
NOT these individual words:
Helen
Ready
Phil
Collins
Brad
Paisley
回答1:
Edit: Having read your clarification, here is hopefully a more relevant answer:
- You did not miss an option to modify the separator character.
- You do need to roll your own tokenizer. I suggest you subclass CharTokenizer. You need to define isTokenChar() according to your spec, meaning that anything but a comma is a token char.
回答2:
You can split the string by comma yourself, and either --
- Index each name using the Keyword analyzer (non-tokenized)
- OR index each name using the standard analyzer, and wrap your searches in quotes. Make sure to index a dummy term in between each name so that "Ready Phil" doesn't match the document
来源:https://stackoverflow.com/questions/2447139/lucene-net-support-phrases-what-is-best-approach-to-tokenize-comma-delimited-d