Java regex: why numbers [0-9], comma etc. is not an unicode?

血红的双手。 提交于 2021-02-05 12:28:04

问题


class Test
{
    public static void main (String[] args)
    {
        String regex = "\\p{L}";
        System.out.println("0".matches(regex));
    }
}

The code above prints false, but I was expecting true because isn't ASCII a subset of unicode? "0" is part of ASCII, so I think it should also belongs to a unicode letter.

Also, comma, period etc prints "false" true, while "a" will print true.


回答1:


It is because \\p{L} matches a Unicode letter and you're matching a digit.

You can use:

[\\p{L}\\p{Nd}.,]

to match a Unicode digit or letter.

You should also use (?U) in front of your regex for Unicode support like this:

String regex = "(?U)[\\p{L}\\p{Nd}.,]+";


来源:https://stackoverflow.com/questions/41846074/java-regex-why-numbers-0-9-comma-etc-is-not-an-unicode

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!