Solr highlighting does not work with multiple fields hl.fl when dynamic field is present

半腔热情 提交于 2019-12-23 15:44:59

问题


I have a dynamic text field bar_* in my index and want Solr to return highlightings for that field. So what I run is:

q=gold&hl=true&hl.fl=bar_*

It works as expected BUT in case I add some more fields to hl.fl it stops working. E.g.

q=gold&hl=true&hl.fl=bar_*,foo

Notes:

  • bar_* and foo fields are in the index/schema and there is no error here.
  • just rewriting request as q=gold&hl=true&hl.fl=bar_*&hl.fl=foo or q=gold&hl=true&hl.fl=bar_* foo does NOT help.
  • I didn't find any bugs in Solr JIRA on that topic.

Does anyone have an idea how to bit this. The possible workarounds that I see are:

  1. Use hl.fl=*. But this one is not good for performance.
  2. Explicitly specify all possible fields names for my dynamic field. But I don't like that at all.

回答1:


I don't know what version is used, but it seems like this was a bug of previous Solr versions, I can confirm that in Solr 7.3 this works as expected.

curl -X GET \
  'http://localhost:8983/solr/test/select?q=x_ggg:Test1%20OR%20bar_x:Test2&hl=true&hl.fl=%2A_ggg,foo,bar_%2A' \
  -H 'cache-control: no-cache'

The more correct way is to do: hl.fl=bar_*,foo,*_ggg (use , or space as delimiter).

This helps to avoid long time debugging when you remove asterisk from your hl.fl parameter and highlighting by fields stops working, since this field not processed as regex anymore.

Here is spots in sources of Solr 7.3, where we can trace this behavior:

  1. Solr calls org.apache.solr.highlight.SolrHighlighter#getHighlightFields
  2. Before processing field, value splited by , or space here: org.apache.solr.util.SolrPluginUtils#split
  private final static Pattern splitList=Pattern.compile(",| ");

  /** Split a value that may contain a comma, space of bar separated list. */
  public static String[] split(String value){
     return splitList.split(value.trim(), 0);
  }
  1. Results of split goes to method org.apache.solr.highlight.SolrHighlighter#expandWildcardsInHighlightFields.

In doc also mentioned expected contract https://lucene.apache.org/solr/guide/7_3/highlighting.html

hl.fl Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets.

A wildcard of * (asterisk) can be used to match field globs, such as text_* or even * to highlight on all fields where highlighting is possible. When using *, consider adding hl.requireFieldMatch=true.

When not defined, the defaults defined for the df query parameter will be used.




回答2:


try

q=gold&hl=true&hl.fl=bar_*&hl.fl=foo



回答3:


After digging into Solr sources (org.apache.solr.highlight.SolrHighlighter#getHighlightFields) I have found a workaround for this. As appears Solr interprets hl.fl content as a regular expression pattern. So I've specified hl.fl as:

hl.fl=bar_*|foo

I.e. using | instead of comma. That worked perfectly for me.

Btw, I have found no documentation of this in the internet.



来源:https://stackoverflow.com/questions/47690813/solr-highlighting-does-not-work-with-multiple-fields-hl-fl-when-dynamic-field-is

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!