Is there an easy way to avoid dealing with text encoding problems?
commons-io 2.0 has WriterOutputStream
Use:
new CharSequenceInputStream(html, StandardCharsets.UTF_8);
This way does not require an upfront conversion to String
and then to byte[]
, which allocates lot more heap memory, in case the report is large. It converts to bytes on the fly as the stream is read, right from the StringBuffer.
It uses CharSequenceInputStream from Apache Commons IO project.
A warning when using WriterOutputStream - it doesn't always handle writing binary data to a file properly/the same as a regular output stream. I had an issue with this that took me awhile to track down.
If you can, I'd recommend using an output stream as your base, and if you need to write strings, use an OUtputStreamWriter wrapper around the stream to do it. It is far more reliable to convert text to bytes than the other way around, which is likely why WriterOutputStream is not a part of the standard Java library
You can use Cactoos (no static methods, only objects):
You can convert the other way around too:
If you are starting off with a String you can also do the following:
new ByteArrayInputStream(inputString.getBytes("UTF-8"))
Are you trying to write the contents of a Reader
to an OutputStream
? If so, you'll have an easier time wrapping the OutputStream
in an OutputStreamWriter
and write the char
s from the Reader
to the Writer
, instead of trying to convert the reader to an InputStream
:
final Writer writer = new BufferedWriter(new OutputStreamWriter( urlConnection.getOutputStream(), "UTF-8" ) );
int charsRead;
char[] cbuf = new char[1024];
while ((charsRead = data.read(cbuf)) != -1) {
writer.write(cbuf, 0, charsRead);
}
writer.flush();
// don't forget to close the writer in a finally {} block