Are the PHP preg_functions multibyte safe?

我与影子孤独终老i 提交于 2019-11-26 20:53:35
outis

PCRE can support UTF-8 and other Unicode encodings, but it has to be specified at compile time. From the man page for PCRE 8.0:

The current implementation of PCRE corresponds approximately with Perl 5.10, including support for UTF-8 encoded strings and Unicode general category properties. However, UTF-8 and Unicode support has to be explicitly enabled; it is not the default. The Unicode tables correspond to Unicode release 5.1.

PHP currently uses PCRE 7.9; your system might have an older version.

Taking a look at the PCRE lib that comes with PHP 5.2, it appears that it's configured to support Unicode properties and UTF-8. Same for the 5.3 branch.

user187291

pcre supports utf8 out of the box, see documentation for the 'u' modifier.

Illustration (\xC3\xA4 is the utf8 encoding for the german letter "ä")

  echo preg_replace('~\w~', '@', "a\xC3\xA4b");

this echoes "@@¤@" because "\xC3" and "\xA4" were treated as distinct symbols

  echo preg_replace('~\w~u', '@', "a\xC3\xA4b");

(note the 'u') prints "@@@" because "\xC3\xA4" were treated as a single letter.

Gumbo

No, they are not. See the question preg_match and UTF-8 in PHP for example.

No, you need to use the multibyte string functions like mb_ereg

Some of my more complicated preg functions:

(1a) validate username as alphanumeric + underscore:

preg_match('/^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$/',$username) 

(1b) possible UTF alternative:

preg_match('/^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$/u',$username) 

(2a) validate email:

preg_match("/^([a-z0-9\+_\-]+)(\.[a-z0-9\+_\-]+)*@([a-z0-9\-]+\.)+[a-z]{2,6}$/ix",$email))

(2b) possible UTF alternative:

preg_match("/^([a-z0-9\+_\-]+)(\.[a-z0-9\+_\-]+)*@([a-z0-9\-]+\.)+[a-z]{2,6}$/ixu",$email))

(3a) normalize newlines:

preg_replace("/(\n){2,}/","\n\n",$str);

(3b) possible UTF alternative:

preg_replace("/(\n){2,}/u","\n\n",$str);

Do thse changes look alright?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!