Remove all non-ASCII characters from string

后端 未结 7 1134
野的像风
野的像风 2020-12-05 06:28

I have a C# routine that imports data from a CSV file, matches it against a database and then rewrites it to a file. The source file seems to have a few non-ASCII characters

相关标签:
7条回答
  • 2020-12-05 06:45
    string sOut = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(s))
    
    0 讨论(0)
  • 2020-12-05 06:46

    Here's an improvement upon the accepted answer:

    string fallbackStr = "";
    
    Encoding enc = Encoding.GetEncoding(Encoding.ASCII.CodePage,
      new EncoderReplacementFallback(fallbackStr),
      new DecoderReplacementFallback(fallbackStr));
    
    string cleanStr = enc.GetString(enc.GetBytes(inputStr));
    

    This method will replace unknown characters with the value of fallbackStr, or if fallbackStr is empty, leave them out entirely. (Note that enc can be defined outside the scope of a function.)

    0 讨论(0)
  • 2020-12-05 06:55

    If you wanted to test a specific character, you could use

    if ((int)myChar <= 127)
    

    Just getting the ASCII encoding of the string will not tell you that a specific character was non-ASCII to begin with (if you care about that). See MSDN.

    0 讨论(0)
  • 2020-12-05 07:01

    Do it all at once

    public string ReturnCleanASCII(string s)
    {
        StringBuilder sb = new StringBuilder(s.Length);
        foreach(char c in s)
        {
           if((int)c > 127) // you probably don't want 127 either
              continue;
           if((int)c < 32)  // I bet you don't want control characters 
              continue;
           if(c == ',')
              continue;
           if(c == '"')
              continue;
           sb.Append(c);
        }
        return sb.ToString();
    }
    
    0 讨论(0)
  • 2020-12-05 07:02

    Here a simple solution:

    public static bool IsASCII(this string value)
    {
        // ASCII encoding replaces non-ascii with question marks, so we use UTF8 to see if multi-byte sequences are there
        return Encoding.UTF8.GetByteCount(value) == value.Length;
    }
    

    source: http://snipplr.com/view/35806/

    0 讨论(0)
  • 2020-12-05 07:02
        public string RunCharacterCheckASCII(string s)
        {
            string str = s;
            bool is_find = false;
            char ch;
            int ich = 0;
            try
            {
                char[] schar = str.ToCharArray();
                for (int i = 0; i < schar.Length; i++)
                {
                    ch = schar[i];
                    ich = (int)ch;
                    if (ich > 127) // not ascii or extended ascii
                    {
                        is_find = true;
                        schar[i] = '?';
                    }
                }
                if (is_find)
                    str = new string(schar);
            }
            catch (Exception ex)
            {
            }
            return str;
        }
    
    0 讨论(0)
提交回复
热议问题