PCRE Encoding Support

天涯浪子 提交于 2019-12-24 12:24:29

问题


I saw in the PCRE Documentation that PCRE support UTF-8 and Unicode general category properties, but i dont see where it say the Native encoding support.

If you say that support ISO-8859-1: where can i found info about that?

In A Nutshell:

Ive compared & im guessing that the encoding supported by PHP is windows-1252 and not the ISO-8859-1 encoding.

if(preg_match('/€/',"\x80"))
    echo "Match";

ISO-8859-1 doesn't have the '€' in that position. Windows-1252 does. Or dependes of the system?

So wich is the native encoding PCRE Support?


回答1:


Exactly this Example is used on regular-expressions.info to describe the difficulties from mixing 8bit and unicode

Mixing Unicode and 8-bit Character Codes

In short, the Euro symbol is on 80h on all windows code pages. How your regex engine treats this may vary. It works when your regex engine is a 8bit and the text file is using a windows code page.
If your regex engine is a pure unicode one, it will read \x80 as \u0080 which is a control code.

So what do you mean with native encoding PCRE Support? This is system dependend and you should not rely on some code pages.

The advantage of unicode is that you can get rid of all the different code pages and all of the problems derived from that.

So to use unicode for that try matching for \x{20AC} this is the unicode code point for the Euro symbol.

Here is an overview on regular-expressions.info about the unicode syntax



来源:https://stackoverflow.com/questions/6658902/pcre-encoding-support

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!