Replacing all non-alphanumeric characters with empty strings

陌路散爱 提交于 2019-11-26 17:07:26
Mirek Pluta

Use [^A-Za-z0-9].

Note: removed the space since that is not typically considered alphanumeric.

Try

return value.replaceAll("[^A-Za-z0-9]", "");

or

return value.replaceAll("[\\W]|_", "");

You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.

If the replacement of these characters is not wanted use pre-defined character classes instead:

 someString.replaceAll("[^\\p{IsAlphabetic}^\\p{IsDigit}]", "");

PS: \p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].

return value.replaceAll("[^A-Za-z0-9 ]", "");

This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.

saurav

You could also try this simpler regex:

 str = str.replaceAll("\\P{Alnum}", "");

Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.

I made this method for creating filenames:

public static String safeChar(String input)
{
    char[] allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
    char[] charArray = input.toString().toCharArray();
    StringBuilder result = new StringBuilder();
    for (char c : charArray)
    {
        for (char a : allowed)
        {
            if(c==a) result.append(a);
        }
    }
    return result.toString();
}
GalloCedrone

Solution:

value.replaceAll("[^A-Za-z0-9]", "")

Explanation:

[^abc] When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.

Looking at the keyword as two function:

  • [(Pattern)] = match(Pattern)
  • [^(Pattern)] = notMatch(Pattern)

Moreover regarding a pattern:

  • A-Z = all characters included from A to Z

  • a-z = all characters included from a to z

  • 0=9 = all characters included from 0 to 9

Therefore it will substitute all the char NOT included in the pattern

Simple method:

public boolean isBlank(String value) {
    return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}

public String normalizeOnlyLettersNumbers(String str) {
    if (!isBlank(str)) {
        return str.replaceAll("[^\\p{L}\\p{Nd}]+", "");
    } else {
        return "";
    }
}
Albin
public static void main(String[] args) {
    String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

    System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

}

output: ChlamydiasppIgGIgMIgAAbs8006

Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java

If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:

 String value = "your value";

 // this could be placed as a static final constant, so the compiling is only done once
 Pattern pattern = Pattern.compile("[^\\w]", Pattern.UNICODE_CHARACTER_CLASS);

 value = pattern.matcher(value).replaceAll("");

Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)

Using Guava you can easily combine different type of criteria. For your specific solution you can use:

value = CharMatcher.inRange('0', '9')
        .or(CharMatcher.inRange('a', 'z')
        .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!