问题
I am looking for some util class/method to take a large String
and return an InputStream
.
If the String
is small, I can just do:
InputStream is = new ByteArrayInputStream(str.getBytes(<charset>));
But when the String
is large(1MB, 10MB or more), a byte array 1 to 2 times(or more?) as large as my String is allocated on the spot. (And since you won't know how many bytes to allocate exactly before all the encoding is done, I think there must be other arrays/buffers allocated before the final byte array is allocated).
I have performance requirements, and want to optimize this operation.
Ideally I think, the class/method I am looking for would encode the characters on the fly one small block at a time as the InputStream is being consumed, thus no big surge of mem allocation.
Looking at the source code of apache commons IOUtils.toInputStream(..)
, I see that it also converts the String to a big byte array in one go.
And StringBufferInputStream
is Deprecated, and does not do the job properly.
Is there such util class/method from anywhere? Or I can just write a couple of lines of code to do this?
The functional need for this is that, elsewhere, I am using a util method that takes an InputStream
and stream out the bytes from this InputStream
.
I haven't seem other people looking for something like this. Am I mistaking something somewhere?
I have started writing a custom class for this, but would stop if there is a better/proper/right solution/correction to my need.
回答1:
The Java built-in libraries assume you'd only need to go from chars to bytes in output, not input. The Apache Commons IO libraries have ReaderInputStream, however, which can wrap a StringReader
to get what you want.
回答2:
For me there is a fundamental problem. Why do you have such huge String
s in memory in the first place...
Anyway, you can try this:
public static InputStream largeStringToBytes(final String tooLarge,
final Charset charset)
{
final CharsetEncoder encoder = charset.newEncoder()
.onUnmappableCharacter(CodingErrorAction.REPORT);
final ByteBuffer buf = charset.encode(CharBuffer.wrap(tooLarge));
return new ByteArrayInputStream(buf.array());
}
回答3:
If you are passing the large string as parameter then the memory is already allocated. A string that big cannot even be pushed on to the stack (most of the time max stack size is 1MB) so this is getting allocated on the heap just to pass it as a parameter. The only way I can see to avoid this would be to create a tree on disk where you streamed back a chracter at a time as you walked the tree. If you have multiple large strings perhaps to can index them in a Trie or a DAWG and walk that structure. This will eliminate many of the duplicate characters between strings. But, I will need to know more about what the strings represent to assist further.
回答4:
Implement your own String-backed input stream:
class StringifiedInputStream extends InputStream {
private int idx=0;
private final String str;
private final int len;
StringifiedInputStream(String str) {
this.str = str;
this.len = str.length();
}
@Override
public int read() throws IOException {
if(idx>=len)
return -1;
return (byte) str.charAt(idx++);
}
}
This is slow, but it streams the bytes without byte array duplication. Add the 3-arg method to this implementation if speed is an issue.
来源:https://stackoverflow.com/questions/27908790/any-util-class-method-to-take-a-large-string-and-return-an-inputstream