Why did this str_ireplace() work on a non ASCII string?

戏子无情 提交于 2019-12-23 08:46:20

问题


Note: What I think I know is probably wrong, so please kindly fix my knowledge :)


I just answered a question about UTF-8 and PHP.

I suggested using str_ireplace('Волгоград', '', $a).

I didn't expect this to work, but it did.

I always thought PHP treated one byte as one character, hence why you need to use mb_* functions to get accurate results when using characters outside of ASCII range.

I assumed the Russian characters would take > 1 byte each.

I thought str_replace() would work because the bytes could be matched regardless of whether they are multibyte or not, as long as they are in order.

I thought str_ireplace() would not work because PHP wouldn't know how to map the non ASCII characters to their alternate case equivalent. But, it did work.


Where and how am I wrong? Give me as much information as you can :)


回答1:


It works by making the text lower case by passing it to the libc functions which are dependent on the locale settings; appropriate settings means that the text will lower case properly if the correct charset is used for the bytes.




回答2:


Another possible explanation. The Unicode planes have similar attributes as the ISO-8859-1 range.

Converting an uppercase letter into lowercase just requires adding 0x20 for the ASCII range:

0x41   A
0x61   a

And -I did not bother to look it up- I think it's the same for the Latin-1 range in 0xC0-0xDF. And this coincidentally might work for the Russian letters in the Unicode range too:

d092d09ed09bd093d09ed093d0a0d090d094   ВОЛГОГРАД
d0b2d0bed0bbd0b3d0bed0b3d180d0b0d0b4   волгоград

The difference is just that 0x20 has been added on the bytes which were assumed to be L1 characters. So it's probably really just a locale setting.




回答3:


Its the other way round: PHP does not treat every character as a byte, but it treats every byte as a character. So multiple characters are seen as multiple characters (and propably not that one you expect).



来源:https://stackoverflow.com/questions/5458824/why-did-this-str-ireplace-work-on-a-non-ascii-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!