Does anyone know if the standard Java library (any version) provides a means of calculating the length of the binary encoding of a string (specifically UTF-8 in this case) w
The best method I could come up with is to use CharsetEncoder to write repeatedly into the same temporary buffer:
public int getEncodedLength(CharBuffer src, CharsetEncoder encoder)
throws CharacterCodingException
{
// CharsetEncoder.flush fails if encode is not called with >0 chars
if (!src.hasRemaining())
return 0;
// encode into a byte buffer that is repeatedly overwritten
final ByteBuffer outputBuffer = ByteBuffer.allocate(1024);
// encoding loop
int bytes = 0;
CoderResult status;
do
{
status = encoder.encode(src, outputBuffer, true);
if (status.isError())
status.throwException();
bytes += outputBuffer.position();
outputBuffer.clear();
}
while (status.isOverflow());
// flush any remaining buffered state
status = encoder.flush(outputBuffer);
if (status.isError() || status.isOverflow())
status.throwException();
bytes += outputBuffer.position();
return bytes;
}
public int getUtf8Length(String str) throws CharacterCodingException
{
return getEncodedLength(CharBuffer.wrap(str),
Charset.forName("UTF-8").newEncoder());
}