Chinese queries result in unexpectly high recall

陌路散爱 提交于 2021-02-11 12:46:41

问题


We experience unexpectedly high recall for Chinese queries. I have managed to reproduce a minimal use-case using a simple data model with only 2 properties.

REPRODUCE

  1. Define a property DescriptionZhCn for Chinese product descriptions, using zh-Hans.microsoft analyzer

  2. Populate two records with the following values in DescriptionZhCn

    Contoso 减振接杆

    Contoso 缩径接柄

  3. Search using options searchMode=all, queryType=full, searchFields=DescriptionZhCn, api-version=2019-05-06 with the following values in the search parameter:

    减振接杆

    缩径接柄

EXPECTED

When searching for 减振接杆 I would expect only the record with description "Contoso 减振接杆". When searching for 缩径接柄 I would expect only the record "Contoso 缩径接柄".

ACTUAL

Searching either 减振接杆 or 缩径接柄 unexpectedly return both records. The only thing common character is the third character 接.

I have verified the output from the zh-Hans.microsoft analyzer and it splits both of the Chinese strings into 4 tokens. E.g.

减振接杆 => 减 振 接 杆

My query only matches one of the tokens. And I'm using searchMode=all. Why does my query match? Is this a bug? Any input Yanoosh, Liam?

来源:https://stackoverflow.com/questions/64485275/chinese-queries-result-in-unexpectly-high-recall

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!