Best way to shorten UTF8 string based on byte length

前端 未结 9 1416
感情败类
感情败类 2020-12-10 12:14

A recent project called for importing data into an Oracle database. The program that will do this is a C# .Net 3.5 app and I\'m using the Oracle.DataAccess connection libra

9条回答
  •  心在旅途
    2020-12-10 12:23

    Here are two possible solution - a LINQ one-liner processing the input left to right and a traditional for-loop processing the input from right to left. Which processing direction is faster depends on the string length, the allowed byte length, and the number and distribution of multibyte characters and is hard to give a general suggestion. The decision between LINQ and traditional code I probably a matter of taste (or maybe speed).

    If speed matters, one could think about just accumulating the byte length of each character until reaching the maximum length instead of calculating the byte length of the whole string in each iteration. But I am not sure if this will work because I don't know UTF-8 encoding well enough. I could theoreticaly imagine that the byte length of a string does not equal the sum of the byte lengths of all characters.

    public static String LimitByteLength(String input, Int32 maxLength)
    {
        return new String(input
            .TakeWhile((c, i) =>
                Encoding.UTF8.GetByteCount(input.Substring(0, i + 1)) <= maxLength)
            .ToArray());
    }
    
    public static String LimitByteLength2(String input, Int32 maxLength)
    {
        for (Int32 i = input.Length - 1; i >= 0; i--)
        {
            if (Encoding.UTF8.GetByteCount(input.Substring(0, i + 1)) <= maxLength)
            {
                return input.Substring(0, i + 1);
            }
        }
    
        return String.Empty;
    }
    

提交回复
热议问题