Is there an easy way to avoid dealing with text encoding problems?
Use:
new CharSequenceInputStream(html, StandardCharsets.UTF_8);
This way does not require an upfront conversion to String and then to byte[], which allocates lot more heap memory, in case the report is large. It converts to bytes on the fly as the stream is read, right from the StringBuffer.
It uses CharSequenceInputStream from Apache Commons IO project.