How to remove accents and tilde in a C++ std::string

前端 未结 8 1392
半阙折子戏
半阙折子戏 2020-12-15 21:26

I have a problem with a string in C++ which has several words in Spanish. This means that I have a lot of words with accents and tildes. I want to replace them for their not

8条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-15 21:49

        /// 
        /// 
        /// Replace any accent and foreign character by their ASCII equivalent.
        /// In other words, convert a string to an ASCII-complient string.
        /// 
        /// This also get rid of special hidden character, like EOF, NUL, TAB and other '\0', except \n\r
        /// 
        /// Tests with accents and foreign characters:
        /// Before: "äæǽaeöœoeüueÄAeÜUeÖOeÀÁÂÃÄÅǺĀĂĄǍΑΆẢẠẦẪẨẬẰẮẴẲẶАAàáâãåǻāăąǎªαάảạầấẫẩậằắẵẳặаaБBбbÇĆĈĊČCçćĉċčcДDдdÐĎĐΔDjðďđδdjÈÉÊËĒĔĖĘĚΕΈẼẺẸỀẾỄỂỆЕЭEèéêëēĕėęěέεẽẻẹềếễểệеэeФFфfĜĞĠĢΓГҐGĝğġģγгґgĤĦHĥħhÌÍÎÏĨĪĬǏĮİΗΉΊΙΪỈỊИЫIìíîïĩīĭǐįıηήίιϊỉịиыїiĴJĵjĶΚКKķκкkĹĻĽĿŁΛЛLĺļľŀłλлlМMмmÑŃŅŇΝНNñńņňʼnνнnÒÓÔÕŌŎǑŐƠØǾΟΌΩΏỎỌỒỐỖỔỘỜỚỠỞỢОOòóôõōŏǒőơøǿºοόωώỏọồốỗổộờớỡởợоoПPпpŔŖŘΡРRŕŗřρрrŚŜŞȘŠΣСSśŝşșšſσςсsȚŢŤŦτТTțţťŧтtÙÚÛŨŪŬŮŰŲƯǓǕǗǙǛŨỦỤỪỨỮỬỰУUùúûũūŭůűųưǔǖǘǚǜυύϋủụừứữửựуuÝŸŶΥΎΫỲỸỶỴЙYýÿŷỳỹỷỵйyВVвvŴWŵwŹŻŽΖЗZźżžζзzÆǼAEßssIJIJijijŒOEƒf'ξksπpβvμmψpsЁYoёyoЄYeєyeЇYiЖZhжzhХKhхkhЦTsцtsЧChчchШShшshЩShchщshchЪъЬьЮYuюyuЯYaяya"
        /// After:  "aaeooeuueAAeUUeOOeAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaaaaaaaaaaaaaaaaBbCCCCCCccccccDdDDjddjEEEEEEEEEEEEEEEEEEeeeeeeeeeeeeeeeeeeFfGGGGGgggggHHhhIIIIIIIIIIIIIiiiiiiiiiiiiJJjjKKkkLLLLllllMmNNNNNnnnnnOOOOOOOOOOOOOOOOOOOOOOooooooooooooooooooooooPpRRRRrrrrSSSSSSssssssTTTTttttUUUUUUUUUUUUUUUUUUUUUUUUuuuuuuuuuuuuuuuuuuuuuuuYYYYYYYYyyyyyyyyVvWWwwZZZZzzzzAEssIJijOEf'kspvmpsYoyoYeyeYiZhzhKhkhTstsChchShshShchshchYuyuYaya"
        /// 
        /// Tests with invalid 'special hidden characters':
        /// Before: "\0\0\000\0000Bj��rk�\'\"\\\0\a\b\f\n\r\t\v\u0020���oacu\'\\\'te�"
        /// After:  "00000Bjrk'\"\\\n\r oacu'\\'te"
        /// 
        /// 
        private string Normalize(string StringToClean)
        {
            string normalizedString = StringToClean.Normalize(NormalizationForm.FormD);
            StringBuilder Buffer = new StringBuilder(StringToClean.Length);
    
            for (int i = 0; i < normalizedString.Length; i++)
            {
                if (CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]) != UnicodeCategory.NonSpacingMark)
                {
                    Buffer.Append(normalizedString[i]);
                }
            }
    
            string PreAsciiCompliant = Buffer.ToString().Normalize(NormalizationForm.FormC);
            StringBuilder AsciiComplient = new StringBuilder(PreAsciiCompliant.Length);
    
            foreach (char character in PreAsciiCompliant)
            {
                //Reject all special characters except \n\r (Carriage-Return and Line-Feed). 
                //Get rid of special hidden character, like EOF, NUL, TAB and other '\0'
                if (((int)character >= 32 && (int)character < 127) || ((int)character == 10 || (int)character == 13)) 
                {
                    AsciiComplient.Append(character);
                }
            }
            return AsciiComplient.ToString().Trim(); // Remove spaces at start and end of string if any
        }
    

提交回复
热议问题