MongoDB diacriticInSensitive search not showing all accented (words with diacritic mark) rows as expected and vice-versa

时光毁灭记忆、已成空白 提交于 2019-11-29 11:28:22

Since mongodb 3.2, text indexes are diacritic insensitive:

With version 3, text index is diacritic insensitive. That is, the index does not distinguish between characters that contain diacritical marks and their non-marked counterpart, such as é, ê, and e. More specifically, the text index strips the characters categorized as diacritics in Unicode 8.0 Character Database Prop List.

So the following query should work:

db.Collection.find( { $text: { $search: "iphone"} } );
db.Collection.find( { name: { $regex: "iphone"} } );

but it looks like there is a bug with dieresis ( ¨ ), even if it's caterorized as diacritic in unicode 8.0 list (issue on JIRA: SERVER-29918 )

Solution

since mongodb 3.4 you can use collation which allows you to perform this kind of query :

for example, to get your expected output, run the following query:

db.Collection.find({name: "iphone"}).collation({locale: "en", strength: 1})

this will output:

{ "_id" : 1, "name" : "iphone" }
{ "_id" : 2, "name" : "iphône" }
{ "_id" : 3, "name" : "iphonë" }
{ "_id" : 4, "name" : "iphônë" }

in the collation, strength is the level of comparaison to perform

  • 1 : base character only
  • 2 : diacritic sensitive
  • 3 : case sensitive + diacritic sensitive
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!