regex to match non-latin char with ASCII 0-31 and 128-255

柔情痞子 提交于 2021-02-16 20:33:08

问题


wanted to match the non-latin char. tried it. as per my understanding if (a.matches("[\\x8A-\\xFF]+")) should return true but its false.

String a = "Ž";
if (a.matches("[\\x8A-\\xFF]+"))
{

}

回答1:


Judging from your title:

Regex to match non-latin char with ASCII 0-31 and 128-255

it seems you're after all characters except those in range 32-127 and you're surprised Ž doesn't match.

If this is correct, I suggest you use the expression [^\x20-\x7F] ("all characters except those in range 32-127"). This does match Ž.

(An exact translation of the regex in your title would look like [\x00-\x1F\x80-\xFF] but this still doesn't match Ž as described below.)

Why your initial attempt didn't work:

The \xNN matches characters unicode values. The unicode value for Ž is 0x017D, i.e. it falls outside of the range \x8A-\xFF.

When you say "Ž" is 8E you're most likely seeing a value from an extended ASCII table, and these are not the values that the Java regex engine works with.



来源:https://stackoverflow.com/questions/30500028/regex-to-match-non-latin-char-with-ascii-0-31-and-128-255

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!