问题
I have a problem when inserting a string to database due to some encoding issues.
String source is a external rss feed. In web browser it looks ok. Even in debugger the text appears to be ok. If I copy the strong to notedpad, the result is also ok.

But in notepad++ was possible to see that string is using combining characters. If changing to ansii, both combined appears. e.g.
á is displayed as a´
(In notepad++ is is like having two chars, on over the other. I even can select ... half of the char)

I googled a lot and tried very different approach to this problem. I really want to find a clever way of convert string with combining diacritics to simple utf8 database compatible ones.
Any help? Thank you so much!
回答1:
This should work for you
output.Normalize(NormalizationForm.FormC)
This little test gave 3, 2, 3. The middle string is correctly combining A and it's diacritic into a single UTF-8 character
Console.WriteLine(Encoding.UTF8.GetByteCount(("A\u0302")));
Console.WriteLine(Encoding.UTF8.GetByteCount(("A\u0302").Normalize(NormalizationForm.FormC)));
Console.WriteLine(Encoding.UTF8.GetByteCount(("T\u0302").Normalize(NormalizationForm.FormC)));
回答2:
My Mac can solve this running the following Command in Terminal:
iconv -f utf-8-mac -t utf-8 inputfile >outputfile
来源:https://stackoverflow.com/questions/20889305/converting-combining-diacritics-to-simple-utf