问题
I want to create a custom Solr FunctionQuery so that I would be able to get the actual length of field (in terms). The results might look like this :
{
"responseHeader":{
"status":0,
"QTime":8,
"params":{
"q":"python",
"indent":"on",
"fl":"title,score,[features efi.query=python store=myfeature_store]",
"wt":"json"}},
"response":{"numFound":793,"start":0,"maxScore":0.33828905,"docs":[
{
"title":"Newest 'python' Questions - Stack Overflow",
"score":0.33828905,
"[features]":"titleLength=5"},
]
}}
The only helpful link I'm able to find is this. But it does not explain the topic very well. I'm very new to Solr, so step-wise procedure will be helpful.
EDIT
I've created a js script called count.js
as follows:
function WordCount(str) {
return str.split(" ").length;
}
function processAdd(cmd) {
doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument
var title = doc.getFieldValue("title");
var count = WordCount(title);
doc.setField("title_count", count);
logger.info("count-script#count: title_count=" + count);
}
function processDelete(cmd) {
// no-op
}
function processMergeIndexes(cmd) {
// no-op
}
function processCommit(cmd) {
// no-op
}
function processRollback(cmd) {
// no-op
}
function finish() {
// no-op
}
Also, I've added following entries in solrconfig.xml
:
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">script</str>
</lst>
</initParams>
<updateRequestProcessorChain name="script">
<processor class="solr.StatelessScriptUpdateProcessorFactory">
<str name="script">count.js</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I've a few questions now:
- For this to work, do I have to re-index the documents using Nutch ?
- How to check if it's working? Will a simple solr query work like
http://localhost:8983/solr/nutch/select?indent=on&q=*:*&wt=json
?
回答1:
You could use the Update Request Processor. Quite a few ways to do it.
Check out the CountFieldValuesUpdateProcessorFactory
You basically clone your field and do a count on it. But this will work only when your source field is multi-valued. That is, before feeding it to Solr, you tokenize them. You configure this in your SolrConfig.xml
<updateRequestProcessorChain name="word-counter">
<processor class="solr.CloneFieldUpdateProcessorFactory">
<str name="source">title</str>
<str name="dest">title_count</str>
</processor>
<processor class="solr.CountFieldValuesUpdateProcessorFactory">
<str name="fieldName">title_count</str>
</processor>
<processor class="solr.DefaultValueUpdateProcessorFactory">
<str name="fieldName">title_count</str>
<int name="value">0</int>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
But remember, this requires that you have "title" as multi-valued which may not be ideal. You can have an additional field something like "title_multi" and mask your process.
Alternatively, you can use a ScriptUpdateProcessor and do perform your counting logic in Javascript.
来源:https://stackoverflow.com/questions/60245288/creating-custom-functionquery-in-solr