问题
I'm beginner of lucene. Here's my source:
ft = new FieldType(StringField.TYPE_STORED);
ft.setTokenized(false);
ft.setStored(true);
ftNA = new FieldType(StringField.TYPE_STORED);
ftNA.setTokenized(true);
ftNA.setStored(true);
Why tokenized in lucene? For example: the String value of "my name is lee"
- case tokenized, "my" "name" "is" "lee"
- case not tokenized, "my name is lee"
I'dont understand why indexing by tokenized. What is the difference between tokenized and not tokenized?
回答1:
Lucene works by finding tokens in documents which satisfy constraints expressed by a query.
If you search for lee for instance, the query will find all documents that contain the token lee. If the field isn't tokenized, you'll only be able to find my name is lee, but not just lee for instance.
Now suppose you search for "is lee". This is a PhraseQuery, which means it'll match the token is followed by the token lee.
Tokenization is needed because Lucene works with an inverted index, ie it maps tokens to the documents that contain them.
来源:https://stackoverflow.com/questions/29457148/why-tokenize-texts-in-lucene