Azure Search - Accent insensitive analyzer not working when sorting

吃可爱长大的小学妹 提交于 2021-02-19 05:27:33

问题


I'm using Azure Search. I have a model with a property with this attributes

[IsRetrievable(true), IsSearchable, IsSortable, Analyzer("standardasciifolding.lucene")]
public string Title { get; set; }

I want the search to be accent insensitive. Although it is working when searching/filtering, it is not working when sorting the results. So, If I have words that start with an accent and I sort alphabetically, those results appear at the end of the list.


回答1:


I verified your use case by creating an index with Id and a Title field that uses the standardasciifolding.lucene analyzer. I then submitted the 4 sample records via the REST API:

{
"value": [
    {
        "@search.action": "mergeOrUpload",
        "Id": "1",
        "Title" : "øks"
    },
    {
        "@search.action": "mergeOrUpload",
        "Id": "2",
        "Title": "aks"
    },      
    {
        "@search.action": "mergeOrUpload",
        "Id": "3",
        "Title": "áks"
    },
    {
        "@search.action": "mergeOrUpload",
        "Id": "4",
        "Title": "oks"
    }                   
]}

I then ran a query with $orderby specified. I used Postman with variables wrapped in double curly braces. Replace with relevant values for your environment.

https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search=*&$count=true&$select=Id,Title&searchMode=all&queryType=full&api-version={{API-VERSION}}&$orderby=Title asc

The results were returned as

{
    "@odata.context": "https://<my-search-service>.search.windows.net/indexes('dg-test-65224345')/$metadata#docs(*)",
    "@odata.count": 4,
    "value": [
        {
            "@search.score": 1.0,
            "Id": "2",
            "Title": "aks"
        },
        {
            "@search.score": 1.0,
            "Id": "4",
            "Title": "oks"
        },
        {
            "@search.score": 1.0,
            "Id": "3",
            "Title": "áks"
        },
        {
            "@search.score": 1.0,
            "Id": "1",
            "Title": "øks"
        }
    ]
}

So, the sort order is indeed a, o, á, ø which confirms what you find. The order is inversed if I change to $orderby=Title desc. Thus, the sorting appears to be done by the original value and not the normalized value. We can check how the analyzer works, by posting a sample title to the analyzer with a POST request to

https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search=*&$count=true&$select=Id,Title&searchMode=all&queryType=full&api-version={{API-VERSION}}&$orderby=Title asc

{  "text": "øks",  "analyzer": "standardasciifolding.lucene" }

Which produces the following tokens

{
"@odata.context": "https://<my-search-service>.search.windows.net/$metadata#Microsoft.Azure.Search.V2020_06_30_Preview.AnalyzeResult",
"tokens": [
    {
        "token": "oks",
        "startOffset": 0,
        "endOffset": 3,
        "position": 0
    },
    {
        "token": "øks",
        "startOffset": 0,
        "endOffset": 3,
        "position": 0
    }
]

}

You could try to define a custom analyzer which produces a normalized version, but I am not sure it will work. For example, the sorting does not appear to support case-insensitive sorting which would be related to this use case where multiple characters should be sorted as if they were a normalized version. E.g. a and A cannot be sorted as the same character according to this user voice entry (feel free to vote for it).

WORKAROUND

The best workaround I can think of is to process the data yourself. Let Title contain the original title, and then create another field called TitleNormalized where you store the normalized version. In your application you would then query with $orderby on the TitleNormalized field.



来源:https://stackoverflow.com/questions/65224345/azure-search-accent-insensitive-analyzer-not-working-when-sorting

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!