Best way to shorten UTF8 string based on byte length

前端 未结 9 1423
感情败类
感情败类 2020-12-10 12:14

A recent project called for importing data into an Oracle database. The program that will do this is a C# .Net 3.5 app and I\'m using the Oracle.DataAccess connection libra

9条回答
  •  情书的邮戳
    2020-12-10 12:17

    Shorter version of ruffin's answer. Takes advantage of the design of UTF8:

        public static string LimitUtf8ByteCount(this string s, int n)
        {
            // quick test (we probably won't be trimming most of the time)
            if (Encoding.UTF8.GetByteCount(s) <= n)
                return s;
            // get the bytes
            var a = Encoding.UTF8.GetBytes(s);
            // if we are in the middle of a character (highest two bits are 10)
            if (n > 0 && ( a[n]&0xC0 ) == 0x80)
            {
                // remove all bytes whose two highest bits are 10
                // and one more (start of multi-byte sequence - highest bits should be 11)
                while (--n > 0 && ( a[n]&0xC0 ) == 0x80)
                    ;
            }
            // convert back to string (with the limit adjusted)
            return Encoding.UTF8.GetString(a, 0, n);
        }
    

提交回复
热议问题