Calculating length in UTF-8 of Java String without actually encoding it

后端 未结 4 840
你的背包
你的背包 2020-12-03 04:45

Does anyone know if the standard Java library (any version) provides a means of calculating the length of the binary encoding of a string (specifically UTF-8 in this case) w

4条回答
  •  鱼传尺愫
    2020-12-03 05:05

    The best method I could come up with is to use CharsetEncoder to write repeatedly into the same temporary buffer:

    public int getEncodedLength(CharBuffer src, CharsetEncoder encoder)
        throws CharacterCodingException
    {
        // CharsetEncoder.flush fails if encode is not called with >0 chars
        if (!src.hasRemaining())
            return 0;
    
        // encode into a byte buffer that is repeatedly overwritten
        final ByteBuffer outputBuffer = ByteBuffer.allocate(1024);
    
        // encoding loop
        int bytes = 0;
        CoderResult status;
        do
        {
            status = encoder.encode(src, outputBuffer, true);
            if (status.isError())
                status.throwException();
            bytes += outputBuffer.position();
    
            outputBuffer.clear();
        }
        while (status.isOverflow());
    
        // flush any remaining buffered state
        status = encoder.flush(outputBuffer);
        if (status.isError() || status.isOverflow())
            status.throwException();
        bytes += outputBuffer.position();
    
        return bytes;
    }
    
    public int getUtf8Length(String str) throws CharacterCodingException
    {
        return getEncodedLength(CharBuffer.wrap(str),
            Charset.forName("UTF-8").newEncoder());
    }
    

提交回复
热议问题