Code to strip diacritical marks using ICU

前端 未结 2 1079
醉话见心
醉话见心 2020-12-18 08:29

Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., cha

2条回答
  •  北荒
    北荒 (楼主)
    2020-12-18 08:53

    After more searching elsewhere:

    UErrorCode status = U_ZERO_ERROR;
    UnicodeString result;
    
    // 's16' is the UTF-16 string to have diacritics removed
    Normalizer::normalize( s16, UNORM_NFKD, 0, result, status );
    if ( U_FAILURE( status ) )
      // complain
    
    // code to convert UTF-16 's16' to UTF-8 std::string 's8' elided
    
    string buf8;
    buf8.reserve( s8.length() );
    for ( string::const_iterator i = s8.begin(); i != s8.end(); ++i ) {
      char const c = *i;
      if ( isascii( c ) )
        buf8.push_back( c );
    }
    // result is in buf8
    

    which is O(n).

提交回复
热议问题