Match non printable/non ascii characters and remove from text

My JavaScript is quite rusty so any help with this would be great. I have a requirement to detect non printable characters (control characters like SOH, BS etc) as well extended ascii characters such as Ž in a string and remove them but I am not sure how to write the code?

Can anyone point me in the right direction for how to go about this? This is what I have so far:

$(document).ready(function() {
    $('.jsTextArea').blur(function() {
        var pattern = /[^\000-\031]+/gi;
        var val = $(this).val();
        if (pattern.test(val)) {    
        for (var i = 0; i < val.length; i++) {
            var res = val.charAt([i]);
                alert("Character " + [i] + " " + res);              
        }          
    }
    else {
         alert("It failed");
     }

    });
});

zx81

To target characters that are not part of the printable basic ASCII range, you can use this simple regex:

[^ -~]+

Explanation: in the first 128 characters of the ASCII table, the printable range starts with the space character and ends with a tilde. These are the characters you want to keep. That range is expressed with [ -~], and the characters not in that range are expressed with [^ -~]. These are the ones we want to replace. Therefore:

result = string.replace(/[^ -~]+/g, "");

No need to test, you can directly process the text box content:

textBoxContent = textBoxContent.replace(/[^\x20-\x7E]+/g, '');

where the range \x20-\x7E covers the printable part of the ascii table.

Example with your code:

$('.jsTextArea').blur(function() {
    this.value = this.value.replace(/[^\x20-\x7E]+/g, '');
});

You have to assign a pattern (instead of string) into isNonAscii variable, then use test() to check if it matches. test() returns true or false.

$(document).ready(function() {
    $('.jsTextArea').blur(function() {
        var pattern = /[^\000-\031]+/gi;
        var val = $(this).val();
        if (pattern.test(val)) {
            alert("It matched");
        }
        else {
            alert("It did NOT match");
        }
    });
});

Check jsFiddle

For those who have this problem and are looking for a 'fix all' solution... This is how I eventually fixed it:

public static string RemoveTroublesomeCharacters(string inString)
{
    if (inString == null)
    {
        return null;
    }

    else
    {
        char ch;
        Regex regex = new Regex(@"[^\u0000-\u007F]", RegexOptions.IgnoreCase);
        Match charMatch = regex.Match(inString);

        for (int i = 0; i < inString.Length; i++)
        {
            ch = inString[i];
            if (char.IsControl(ch))
            {
                string matchedChar = ch.ToString();
                inString = inString.Replace(matchedChar, string.Empty);
            }
        }

        while (charMatch.Success)
        {
            string matchedChar = charMatch.ToString();
            inString = inString.Replace(matchedChar, string.Empty);
            charMatch = charMatch.NextMatch();
        }
    }       

    return inString;
}

I'll break it down a bit more detail for those less experienced:

We first loop through every character of the entire string and use the IsControl method of char to determine if a character is a control character or not.
If a control character is found, copy that matched character to a string then use the Replace method to change the control character to an empty string. Rinse and repeat for the rest of the string.
Once we have looped through the entire string we then use the regex defined (which will match any character that is not a control character or standard ascii character) and again replace the matched character with an empty string. Doing this in a while loop means that all the time charMatch is true the character will be replaced.
Finally once all characters are removed and we have looped the entire string we return the inString.

(Note: I have still not yet managed to figure out how to repopulate the TextBox with the new modified inString value, so if anyone can point out how it can be done that would be great)

来源：https://stackoverflow.com/questions/24229262/match-non-printable-non-ascii-characters-and-remove-from-text

标签

javascript

regex

control-characters