Calculating length in UTF-8 of Java String without actually encoding it

后端未结

关注

 4  840

你的背包 2020-12-03 04:45

Does anyone know if the standard Java library (any version) provides a means of calculating the length of the binary encoding of a string (specifically UTF-8 in this case) w

4条回答

鱼传尺愫 (楼主)

2020-12-03 05:05

The best method I could come up with is to use CharsetEncoder to write repeatedly into the same temporary buffer:

public int getEncodedLength(CharBuffer src, CharsetEncoder encoder)
    throws CharacterCodingException
{
    // CharsetEncoder.flush fails if encode is not called with >0 chars
    if (!src.hasRemaining())
        return 0;

    // encode into a byte buffer that is repeatedly overwritten
    final ByteBuffer outputBuffer = ByteBuffer.allocate(1024);

    // encoding loop
    int bytes = 0;
    CoderResult status;
    do
    {
        status = encoder.encode(src, outputBuffer, true);
        if (status.isError())
            status.throwException();
        bytes += outputBuffer.position();

        outputBuffer.clear();
    }
    while (status.isOverflow());

    // flush any remaining buffered state
    status = encoder.flush(outputBuffer);
    if (status.isError() || status.isOverflow())
        status.throwException();
    bytes += outputBuffer.position();

    return bytes;
}

public int getUtf8Length(String str) throws CharacterCodingException
{
    return getEncodedLength(CharBuffer.wrap(str),
        Charset.forName("UTF-8").newEncoder());
}

0 讨论(0)

查看其它4个回答