Creating custom FunctionQuery in Solr

喜夏-厌秋 提交于 2020-03-05 01:30:56

问题


I want to create a custom Solr FunctionQuery so that I would be able to get the actual length of field (in terms). The results might look like this :

{
  "responseHeader":{
    "status":0,
    "QTime":8,
    "params":{
      "q":"python",
      "indent":"on",
      "fl":"title,score,[features efi.query=python store=myfeature_store]",
      "wt":"json"}},
  "response":{"numFound":793,"start":0,"maxScore":0.33828905,"docs":[
      {
        "title":"Newest 'python' Questions - Stack Overflow",
        "score":0.33828905,
        "[features]":"titleLength=5"},
      ]
  }}

The only helpful link I'm able to find is this. But it does not explain the topic very well. I'm very new to Solr, so step-wise procedure will be helpful.

EDIT

I've created a js script called count.js as follows:

function WordCount(str) { 
  return str.split(" ").length;
}

function processAdd(cmd) {
    doc = cmd.solrDoc;  // org.apache.solr.common.SolrInputDocument
    var title = doc.getFieldValue("title");
    var count = WordCount(title);
    doc.setField("title_count", count);
    logger.info("count-script#count: title_count=" + count);
}

function processDelete(cmd) {
  // no-op
}

function processMergeIndexes(cmd) {
  // no-op
}

function processCommit(cmd) {
  // no-op
}

function processRollback(cmd) {
  // no-op
}

function finish() {
  // no-op
}

Also, I've added following entries in solrconfig.xml:

<initParams path="/update/**">
    <lst name="defaults">
      <str name="update.chain">script</str>
    </lst>
  </initParams>

<updateRequestProcessorChain name="script">
    <processor class="solr.StatelessScriptUpdateProcessorFactory">
      <str name="script">count.js</str>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

I've a few questions now:

  1. For this to work, do I have to re-index the documents using Nutch ?
  2. How to check if it's working? Will a simple solr query work like http://localhost:8983/solr/nutch/select?indent=on&q=*:*&wt=json ?

回答1:


You could use the Update Request Processor. Quite a few ways to do it.

Check out the CountFieldValuesUpdateProcessorFactory

You basically clone your field and do a count on it. But this will work only when your source field is multi-valued. That is, before feeding it to Solr, you tokenize them. You configure this in your SolrConfig.xml

    <updateRequestProcessorChain name="word-counter">
  <processor class="solr.CloneFieldUpdateProcessorFactory">
    <str name="source">title</str>
    <str name="dest">title_count</str>
  </processor>
  <processor class="solr.CountFieldValuesUpdateProcessorFactory">
    <str name="fieldName">title_count</str>
  </processor>
  <processor class="solr.DefaultValueUpdateProcessorFactory">
    <str name="fieldName">title_count</str>
    <int name="value">0</int>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

But remember, this requires that you have "title" as multi-valued which may not be ideal. You can have an additional field something like "title_multi" and mask your process.

Alternatively, you can use a ScriptUpdateProcessor and do perform your counting logic in Javascript.



来源:https://stackoverflow.com/questions/60245288/creating-custom-functionquery-in-solr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!