How to replace special characters with their equivalent (such as “ á ” for “ a”) in C#?

百般思念 提交于 2019-12-29 05:21:50

问题


I need to get the Portuguese text content out of an Excel file and create an xml which is going to be used by an application that doesn't support characters such as "ç", "á", "é", and others. And I can't just remove the characters, but replace them with their equivalent ("c", "a", "e", for example).

I assume there's a better way to do it than check each character individually and replace it with their counterparts. Any suggestions on how to do it?


回答1:


You could try something like

var decomposed = "áéö".Normalize(NormalizationForm.FormD);
var filtered = decomposed.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
var newString = new String(filtered.ToArray());

This decomposes accents from the text, filters them and creates a new string. Combining diacritics are in the Non spacing mark unicode category.




回答2:


string text = {text to replace characters in};

Dictionary<char, char> replacements = new Dictionary<char, char>();

// add your characters to the replacements dictionary, 
// key: char to replace
// value: replacement char

replacements.Add('ç', 'c');
...

System.Text.StringBuilder replaced = new System.Text.StringBuilder();
for (int i = 0; i < text.Length; i++)
{
    char character = text[i];
    if (replacements.ContainsKey(character))
    {
        replaced.Append(replacements[character]);
    }
    else
    {
        replaced.Append(character);
    }
}

// 'replaced' is now your converted text



回答3:


For future reference, this is exactly what I ended up with:

temp = stringToConvert.Normalize(NormalizationForm.FormD);
            IEnumerable<char> filtered = temp;
            filtered = filtered.Where(c => char.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark);
            final = new string(filtered.ToArray());



回答4:


The perform is better with this solution:

string test = "áéíóúç";

string result = Regex.Replace(test .Normalize(NormalizationForm.FormD), "[^A-Za-z| ]", string.empty);


来源:https://stackoverflow.com/questions/2393887/how-to-replace-special-characters-with-their-equivalent-such-as-%c3%a1-for-a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!