Sort Facets by Index with non-ASCII values

混江龙づ霸主 提交于 2021-01-28 03:26:31

问题


We have a field 'facet_tag' that contains tags describing a product. Since the tags are in german, they may contain non-ASCII characters (like umlauts). Here are some possible values:

"Zelte"
"Tunnelzelte"
"Äxte"
"Sägen"
"Softshells"

Now if we query solr for the facets with a query like:

http://<solr_host>:<solr_port>/solr/select?q=*&facet=on&facet.field=facet_tag&facet.sort=index

The sorted result looks like this:

<lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields">
        <lst name="facet_tag">
            <int name="Softshells">1</int>
            <int name="Sägen">1</int>
            <int name="Tunnelzelte">1</int>
            <int name="Zelte">1</int>
            <int name="Äxte">2</int>
        </lst>
    </lst>
    <lst name="facet_dates"/>
    <lst name="facet_ranges"/>
</lst>

The tag "Äxte" should be the first item, followed by "Sägen". Obviously Solr does not handle non-ASCII characters well in this case (which is also stated in the documentation for faceted search, see http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort)

Is there any way to let Solr sort these values properly without normalizing umlauts (since we show the values to the user)?


回答1:


I would use ASCIIFoldingFilterFactory:

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.

This way what you index becomes normalized (for example Äxte becomes indexed as Axte), but what is stored doesn't change. That's why you should then get the expected sorting, but the content you'll show will still be the original one (Äxte for example).

UPDATE
The solution doesn't apply to facets since they use the indexed values. Using the ASCIIFoldingFilterFactory you can have the right sort but you'll see normalized character as output as well. Basically you can have the right sort but wrong output or wrong sort but right output. Unfortunately I don't know any other solution.



来源:https://stackoverflow.com/questions/11245068/sort-facets-by-index-with-non-ascii-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!