Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., cha
After more searching elsewhere:
UErrorCode status = U_ZERO_ERROR;
UnicodeString result;
// 's16' is the UTF-16 string to have diacritics removed
Normalizer::normalize( s16, UNORM_NFKD, 0, result, status );
if ( U_FAILURE( status ) )
// complain
// code to convert UTF-16 's16' to UTF-8 std::string 's8' elided
string buf8;
buf8.reserve( s8.length() );
for ( string::const_iterator i = s8.begin(); i != s8.end(); ++i ) {
char const c = *i;
if ( isascii( c ) )
buf8.push_back( c );
}
// result is in buf8
which is O(n).