Unicode Regex; Invalid XML characters

前端 未结 6 728
无人共我
无人共我 2020-11-29 20:23

The list of valid XML characters is well known, as defined by the spec it\'s:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
<         


        
6条回答
  •  一向
    一向 (楼主)
    2020-11-29 20:56

    In PHP the regex would look like the following way:

    protected function isStringValid($string)
    {
        $regex = '/[^\x{9}\x{a}\x{d}\x{20}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/u';
    
        return (preg_match($regex, $string, $matches) === 0);
    }
    

    This would handle all 3 ranges from the xml specification:

    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
    

提交回复
热议问题