问题
I have the following example code:
var inputString = "ñaáme";
inputString = inputString.Replace('ñ', '\u00F1');
var normalizedString = inputString.Normalize(NormalizationForm.FormD);
var result = Regex.Replace(normalizedString, @"[^ñÑa-zA-Z0-9\s]*", string.Empty);
return result.Replace('\u00F1', 'ñ'); // naame :(
I need to normalize the text without removing the "ñ"s
I followed this example But it's for Java and it has not worked for me
I want your result to be: "ñaame".
回答1:
You may match any Unicode letter other than your specific letter ñ
and ASCII letters (that do not need normalization) with (?i)[\p{L}-[ña-z]]+
regex and normalize it. Then, also remove any combining marks from the string.
Use
var inputString = "ñaáme";
var result = string.Concat(Regex.Replace(inputString, @"(?i)[\p{L}-[ña-z]]+", m =>
m.Value.Normalize(NormalizationForm.FormD)
)
.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
Console.Write(result);
See the C# demo
Pattern description
(?i)
- ignore case modifier[
- start of a character class\p{L}
- any Unicode letter-[
- other thanña-z
-ñ
and ASCII letters
]
- end of the subtraction class
]+
- 1 or more occurrences.
来源:https://stackoverflow.com/questions/47488491/remove-accents-in-string-except-%c3%b1