Lucene.Net support phrases?: What is best approach to tokenize comma-delimited data (atomically) in fields during indexing?

I have a database with a column I wish to index that has comma-delimited names, e.g.,

User.FullNameList = "Helen Ready, Phil Collins, Brad Paisley"

I prefer to tokenize each name atomically (name as a whole searchable entity). What is the best approach for this?

Or does Lucene.net not support phrases?

Or is it smart enough to handle this use case automatically?

I'm sure I'm not the first person to have to do this. Googling produced no noticeable solutions.

*** EDIT: using my example, I want to store these name phrases in a single field:

Helen Ready

Phil Collins

Brad Paisley

NOT these individual words:

Helen

Ready

Phil

Collins

Brad

Paisley

Edit: Having read your clarification, here is hopefully a more relevant answer:

You did not miss an option to modify the separator character.
You do need to roll your own tokenizer. I suggest you subclass CharTokenizer. You need to define isTokenChar() according to your spec, meaning that anything but a comma is a token char.

You can split the string by comma yourself, and either --

Index each name using the Keyword analyzer (non-tokenized)
OR index each name using the standard analyzer, and wrap your searches in quotes. Make sure to index a dummy term in between each name so that "Ready Phil" doesn't match the document

来源：https://stackoverflow.com/questions/2447139/lucene-net-support-phrases-what-is-best-approach-to-tokenize-comma-delimited-d

标签