REGEXP returning false on special characters

这一生的挚爱 提交于 2019-12-24 13:48:21

问题


I'm not too good in regexp but hoping someone could explain better to me, I found this in the code that I debug. I wonder why I always got false on this scenario.

I know \p{L} matches a single code point in the category "letter". 0-9 is numeric.

$regExp = /^\s*
     (?P([0-2]?[1-9]|[12]0|3[01]))\s+
     (?P\p{L}+?)\s+
     (?P[12]\d{3})\s*$/i;

    $value = '12 Février 2015' ;
    $matches = array();

    $match = preg_match($regExp, $value, $matches);

Additional information, I have come up with this:

$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/i", "18 Février 2015");
var_dump($match); //It will print int(0).

But if the value is 18 February 2015, it will print int(1). Why is that so? It is suppose to return 1 in both values because \p{L} will accept unicode characters.


回答1:


$regExp = '/^\s*(?P<y>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<m>\p{L}+?)\s+(?P<d>[12]\d{3})\s*$/usD';

$value = '12 Février 2015';
$matches = array();

$match = preg_match($regExp, $value, $matches);

var_dump($matches);

You always have to use <name> with the (?P unless you want an error... And by unicode multiline strings you need the usD flags. It is easy to remember, its like USA dollar...




回答2:


No named groups are needed, and the syntax for them seems to be wrong anyway. So this cleaned-up version should work:

/^ \s*([0-2]?[1-9]|[12]0|3[01])\s+ \p{L}+?\s+ [12]\d{3}\s* $/i

The pattern for the day of the month would also be more intelligible as:

(0?[1-9]|[12][0-9]|3[01])




回答3:


Figured out a fix, use /u instead of /i.

$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/u", "18 Février 2015");
var_dump($match); //It will print int(1).

Thanks all for all the help




回答4:


Use the u modifier for unicode:

$regExp = /^\s*
   (?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+
   (?P<monthNameFull>\p{L}+?)\s+
   (?P<yearFull>[12]\d{3})\s*$/u;
//                      here __^

The i modifier is not mandatory, \p{L} is case insensitive.



来源:https://stackoverflow.com/questions/25669905/regexp-returning-false-on-special-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!