Regex for validating alphabetics and numbers in the localized string

巧了我就是萌 提交于 2019-12-18 02:43:19


I have an input field which is localized. I need to add a validation using a regex that it must take only alphabets and numbers. I could have used [a-z0-9] if I were using only English.

As of now, I am using the method Character.isLetterOrDigit(name.charAt(i)) (yes, I am iterating through each character) to filter out the alphabets present in various languages.

Are there any better ways of doing it? Any regex or other libraries available for this?


Since Java 7 you can use Pattern.UNICODE_CHARACTER_CLASS

String s = "Müller";

Pattern p = Pattern.compile("^\\w+$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher m = p.matcher(s);
if (m.find()) {
} else {
    System.out.println("not found");

with out the option it will not recognize the word "Müller", but using Pattern.UNICODE_CHARACTER_CLASS

Enables the Unicode version of Predefined character classes and POSIX character classes.

See here for more details

You can also have a look here for more Unicode information in Java 7.

and here on an overview over the Unicode scripts, properties and blocks.

See here a famous answer from tchrist about the caveats of regex in Java, including an updated what has changed with Java 7 (of will be in Java 8)


boolean foundMatch = name.matches("[\\p{L}\\p{Nd}]*");

should work.

[\p{L}\p{Nd}] matches a character that is either a Unicode letter or digit. The regex .matches() method ensures that the entire string matches the pattern.


Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

-- Jamie Zawinksi

I say this in jest, but iterating through the String like you are doing will have runtime performance at least as good as any regex — there's no way a regex can do what you want any faster; and you don't have the overhead of compiling a pattern in the first place.

So as long as:

  • the validation doesn't need to do anything else regex-like (nothing was mentioned in the question)
  • the intention of the code looping through the String is clear (and if not, refactor until it is)

then why replace it with a regex just because you can?

