Best way to shorten UTF8 string based on byte length

前端 未结 9 1391
感情败类
感情败类 2020-12-10 12:14

A recent project called for importing data into an Oracle database. The program that will do this is a C# .Net 3.5 app and I\'m using the Oracle.DataAccess connection libra

9条回答
  •  轮回少年
    2020-12-10 12:22

    Following Oren Trutner's comment here are two more solutions to the problem:
    here we count the number of bytes to remove from the end of the string according to each character at the end of the string, so we don't evaluate the entire string in every iteration.

    string str = "朣楢琴执执 瑩浻牡楧硰执执獧浻牡楧敬瑦 瀰 絸朣杢执獧扻捡杫潲湵 潣" 
    int maxBytesLength = 30;
    var bytesArr = Encoding.UTF8.GetBytes(str);
    int bytesToRemove = 0;
    int lastIndexInString = str.Length -1;
    while(bytesArr.Length - bytesToRemove > maxBytesLength)
    {
       bytesToRemove += Encoding.UTF8.GetByteCount(new char[] {str[lastIndexInString]} );
       --lastIndexInString;
    }
    string trimmedString = Encoding.UTF8.GetString(bytesArr,0,bytesArr.Length - bytesToRemove);
    //Encoding.UTF8.GetByteCount(trimmedString);//get the actual length, will be <= 朣楢琴执执 瑩浻牡楧硰执执獧浻牡楧敬瑦 瀰 絸朣杢执獧扻捡杫潲湵 潣潬昣昸昸慢正 
    

    And an even more efficient(and maintainable) solution: get the string from the bytes array according to desired length and cut the last character because it might be corrupted

    string str = "朣楢琴执执 瑩浻牡楧硰执执獧浻牡楧敬瑦 瀰 絸朣杢执獧扻捡杫潲湵 潣" 
    int maxBytesLength = 30;    
    string trimmedWithDirtyLastChar = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(str),0,maxBytesLength);
    string trimmedString = trimmedWithDirtyLastChar.Substring(0,trimmedWithDirtyLastChar.Length - 1);
    

    The only downside with the second solution is that we might cut a perfectly fine last character, but we are already cutting the string, so it might fit with the requirements.
    Thanks to Shhade who thought about the second solution

提交回复
热议问题