termfreq for a phrase

笑着哭i 提交于 2020-01-02 18:02:22

问题


I'm using SOLR 4.x termfreq feature in the following example to find "autozero amplifiers" in a field CONTENTS.

http://localhost:8080/solr/select/?fl=contents,documentPageId,termfreq%28contents,%27autozero%20amplifiers%27%29&defType=func&q=termfreq%28contents,%27autozero%20amplifiers%27%29&fq=documentId%3A49667

I am getting zero frequency for the following paragraph which contains the phrase "autozero amplifiers".

What do I have to do either to solrconfig.xml or schema.xml in order to use termfreq on a phrase not just one word "amplifier"?


回答1:


Unless you let Lucene consider "autozero amplifiers" as one term, you can't use term vectors to get what you are looking for. You could use KeywordTokenizerFactory for indexing, which doesn't actually tokenize the words, it preserves the entire stream of text as one token. But if, for instance, the field you are interested in is containing following text,

 "The quick brown fox jumps over the lazy dog"

how do you define your term boundaries ?

 The quick
 The quick brown
 quick brown
 quick brown fox jumps
 over the lazy dog
 .....

the combination grows exponentially for a singe field of value. Since I have been answering some of your questions related to term vectors leading up to this one, my guess is that you are trying to bend Solr/Lucene to count word/set of words in a large document. You could consider integrating Solr with Hadoop, let Hadoop do all the counting for you. Heck! every Hadoop example talks about word count & line count.. Solr + Hadoop = Big Data Love or perhaps you could do it in your own app layer.

I don't have much info on your application data volume, requirement goals etc.. so this is a suggestion at best.




回答2:


You may try the following trick

  1. termfreq() on both the words individually and do the sum() to get the count of it.

  2. Further, you may use if() to check your values.

Hope, this sounds good for your requirement.



来源:https://stackoverflow.com/questions/9024670/termfreq-for-a-phrase

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!