Replacing characters in C# (ascii)

前端 未结 7 972
挽巷
挽巷 2020-12-05 08:26

I got a file with characters like these: à, è, ì, ò, ù - À. What i need to do is replace those characters with normal characters eg: à = a, è = e and so on..... This is my c

7条回答
  •  生来不讨喜
    2020-12-05 09:12

    Don't know if it is useful but in an internal tool to write message on a led screen we have the following replacements (i'm sure that there are more intelligent ways to make this work for the unicode tables, but this one is enough for this small internal tool) :

            strMessage = Regex.Replace(strMessage, "[éèëêð]", "e");
            strMessage = Regex.Replace(strMessage, "[ÉÈËÊ]", "E");
            strMessage = Regex.Replace(strMessage, "[àâä]", "a");
            strMessage = Regex.Replace(strMessage, "[ÀÁÂÃÄÅ]", "A");
            strMessage = Regex.Replace(strMessage, "[àáâãäå]", "a");
            strMessage = Regex.Replace(strMessage, "[ÙÚÛÜ]", "U");
            strMessage = Regex.Replace(strMessage, "[ùúûüµ]", "u");
            strMessage = Regex.Replace(strMessage, "[òóôõöø]", "o");
            strMessage = Regex.Replace(strMessage, "[ÒÓÔÕÖØ]", "O");
            strMessage = Regex.Replace(strMessage, "[ìíîï]", "i");
            strMessage = Regex.Replace(strMessage, "[ÌÍÎÏ]", "I");
            strMessage = Regex.Replace(strMessage, "[š]", "s");
            strMessage = Regex.Replace(strMessage, "[Š]", "S");
            strMessage = Regex.Replace(strMessage, "[ñ]", "n");
            strMessage = Regex.Replace(strMessage, "[Ñ]", "N");
            strMessage = Regex.Replace(strMessage, "[ç]", "c");
            strMessage = Regex.Replace(strMessage, "[Ç]", "C");
            strMessage = Regex.Replace(strMessage, "[ÿ]", "y");
            strMessage = Regex.Replace(strMessage, "[Ÿ]", "Y");
            strMessage = Regex.Replace(strMessage, "[ž]", "z");
            strMessage = Regex.Replace(strMessage, "[Ž]", "Z");
            strMessage = Regex.Replace(strMessage, "[Ð]", "D");
            strMessage = Regex.Replace(strMessage, "[œ]", "oe");
            strMessage = Regex.Replace(strMessage, "[Œ]", "Oe");
            strMessage = Regex.Replace(strMessage, "[«»\u201C\u201D\u201E\u201F\u2033\u2036]", "\"");
            strMessage = Regex.Replace(strMessage, "[\u2026]", "...");
    

    One thing to note is that if in most language the text is still understandable after such a treatment it's not always the case and will often force the reader to refer to the context of the sentence to be able to understand it. Not something you want if you have the choice.


    Note that the correct solution would be to use the unicode tables, replacing characters with integrated diacritics with their "combined diacritical mark(s)"+character form and then removing the diacritics...

提交回复
热议问题