Javascript regex insensitive turkish character issue

有些话、适合烂在心里 提交于 2019-11-29 17:21:16

How regEx works with Small-Case and Upper-Case chars is based on the Hex-Code of the characters and how they are represented in Unicode consortium of that Unicode set(any language, I hope so as Unicode are based on International Standards).

eg: For English

Similarly, we have

Above are some highlighted characters with same colors are Upper and Small Case representation of their own and there is only one difference in their Hex-code. for Ê Hex-Code is 00CA and for ê is 00EA with one diffrence C and E at third position.

Similarly for Ý and ý Hex-Code is 00DD and u00FD with one difference D and F

Now check this eg:

'ÊÌÝêìý'.match(/Ì/gi) //case insensitive
//output ["Ì", "ì"]
'ÊÌÝêìý'.match(/Ì/g) //case sensitive
//output ["Ì"]

'ÊÌÝêìý'.match(/Ý/ig) //case insensitive
//output ["Ý", "ý"]
'ÊÌÝêìý'.match(/Ý/g) //case sensitive
//output ["Ý"]

If you are using right Characters then it should work normally. I don't know much about Latin-Turkish Characters.

This is subject of Unicode characters.

What happens is that i in your example is not a single letter but 2 because the tilde counts as a character as well. This brings lots of complexities and rules that needs to be followed in order to meet Unicode rules.

You could do something like: ([\x{0049}-\x{0130}]) to meet your i needs but this expression may vary depending if you are going to use this expression on .net, java, javascript or php.

*Online Demo*

You could also check what code each character represents here:

http://www.fileformat.info/info/unicode/char/search.htm?q=%C4%B0&preview=entity

you can express lower and upper cases in a bracket

/[İi]stanbul/i

you can see from here

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!