问题
I'm new to Lucene, i started learning the version 3 branch and there's one thing i don't understand (obviously because i'm not experienced in the subject).
In Lucene 2.9, if i wanted a list of tokens i would create an ArrayList of Token class, ArrayList for example. That's pretty intuitive for me and the concept of token is very clear.
Now that the use of Token class is disencouraged in favour of the Attribute based API, do i have to create my own class to encapsulate the attributes i want? If yes, isn't that almost recreating the Lucene's Token class?
I'm doing a class to test analyzers, and having a list of resulting tokens makes it easier to test, i guess.
Any help would be appreciated ;) Thank you!
回答1:
According to the Token Javadoc, "Even though it is not necessary to use Token anymore, with the new TokenStream API it can be used as convenience class that implements all Attributes, which is especially useful to easily switch from the old to the new TokenStream API."
I suggest you keep using a Token. It matches the description above.
回答2:
Use the TermAttribute class:
TokenStream stream = analyzer.tokenStream("field", "text");
TermAttribute termAttr = stream.getAttribute(TermAttribute.class);
while (stream.incrementToken()) {
String token = termAttr.term();
}
回答3:
I think you can do something like this:
TokenStream tkst = analyzer.tokenStream("field", "text");
Token token = tkst.getAttribute(Token.class);
while (tkst.incrementToken()) {
// Do something with token.
}
The proper documentation is in the analysis package: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/package-summary.html
来源:https://stackoverflow.com/questions/3916806/list-of-tokens-on-lucene-3