How to get frequently occurring phrases with Lucene

后端未结

关注

 3  932

难免孤独 2020-12-16 05:54

I would like to get some frequently occurring phrases with Lucene. I am getting some information from TXT files, and I am losing a lot of context for not having information

3条回答

天涯浪人 (楼主)

2020-12-16 07:00

Julia, It seems what you are looking for is n-grams, specifically Bigrams (also called collocations).

Here's a chapter about finding collocations (PDF) from Manning and Schutze's Foundations of Statistical Natural Language Processing.

In order to do this with Lucene, I suggest using Solr with ShingleFilterFactory. Please see this discussion for details.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...