How to get frequently occurring phrases with Lucene

后端未结

关注

 3  916

难免孤独 2020-12-16 05:54

I would like to get some frequently occurring phrases with Lucene. I am getting some information from TXT files, and I am losing a lot of context for not having information

3条回答

眼角桃花 (楼主)

2020-12-16 06:38
Is it possible for you to post any code that you have written?

Basically a lot depends on the way you create your fields and store documents in lucene.

Lets consider a case where I have got two fields: ID and Comments; and in my ID field I allow values like this 'finding nemo' i.e. strings with space. Whereas 'Comments' is a free flow text field i.e. I allow anything and everything which my keyboard allows and what lucene can understand.

Now in real life scenario it does not make sense to make my ID:'finding nemo' as two different searchable string. Whereas I want to index everything in Comments.

So what I will do is, I will create a document (org.apache.lucene.document.Document) object to take care of this... Something like this
```
Document doc = new Document();
doc.add(new Field("comments","Finding nemo was a very tough job for a clown fish ...", Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("id", "finding nemo", Field.Store.YES, Field.Index.NOT_ANALYZED));
```
So, essentially I have created two fields:
1. comments: Where I have preferred to analyze it by using Field.Index.ANALYZED
2. id: Where I directed lucene to store it but do not analyze it Field.Index.NOT_ANALYZED
This is how you customize lucene for Default Tokenizer and analyser. Otherwise you can write your own Tokenizer and analyzers.

Link(s) http://darksleep.com/lucene/

Hope this will help you... :)
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...