List of “tokens” on Lucene 3

怎甘沉沦 提交于 2019-12-07 08:12:59

问题


I'm new to Lucene, i started learning the version 3 branch and there's one thing i don't understand (obviously because i'm not experienced in the subject).

In Lucene 2.9, if i wanted a list of tokens i would create an ArrayList of Token class, ArrayList for example. That's pretty intuitive for me and the concept of token is very clear.

Now that the use of Token class is disencouraged in favour of the Attribute based API, do i have to create my own class to encapsulate the attributes i want? If yes, isn't that almost recreating the Lucene's Token class?

I'm doing a class to test analyzers, and having a list of resulting tokens makes it easier to test, i guess.

Any help would be appreciated ;) Thank you!


回答1:


According to the Token Javadoc, "Even though it is not necessary to use Token anymore, with the new TokenStream API it can be used as convenience class that implements all Attributes, which is especially useful to easily switch from the old to the new TokenStream API."

I suggest you keep using a Token. It matches the description above.




回答2:


Use the TermAttribute class:

TokenStream stream = analyzer.tokenStream("field", "text");
TermAttribute termAttr = stream.getAttribute(TermAttribute.class);
while (stream.incrementToken()) {
    String token = termAttr.term();
}



回答3:


I think you can do something like this:

TokenStream tkst = analyzer.tokenStream("field", "text");
Token token = tkst.getAttribute(Token.class);
while (tkst.incrementToken()) {
// Do something with token.
}

The proper documentation is in the analysis package: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/package-summary.html



来源:https://stackoverflow.com/questions/3916806/list-of-tokens-on-lucene-3

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!