I'm getting a java.lang.OutOfMemoryError: Java heap space even with GSON Streaming.
{"result":"OK","base64":"JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC...."}
base64 can be up to 200Mb long. GSON is taking much more memory than that, (3GB) When I try to store the base64 in a variable I get a:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuilder.append(StringBuilder.java:204)
at com.google.gson.stream.JsonReader.nextQuotedValue(JsonReader.java:1014)
at com.google.gson.stream.JsonReader.nextString(JsonReader.java:815)
What is the best way to handle this kind of fields?
The reason of why you're getting OutOfMemoryError
is that GSON nextString()
returns a string that's aggregated during building a very huge string using StringBuilder
. When you're facing with such an issue, you have to deal with intermediate data since there is no other choice. Unfortunately, GSON does not let you to process huge literals in any way.
Not sure if you can change the response payload, but if you can't, you might want to implement your own JSON reader, or "hack" the existing JsonReader
to make it work in streaming fashion. The example below is based on GSON 2.5 and makes heavy use of reflection because JsonReader
hides its state very carefully.
EnhancedGson25JsonReader.java
final class EnhancedGson25JsonReader
extends JsonReader {
// A listener to accept the internal character buffers.
// Accepting a single string built on such buffers is total memory waste as well.
interface ISlicedStringListener {
void accept(char[] buffer, int start, int length)
throws IOException;
}
// The constants can be just copied
/** @see JsonReader#PEEKED_NONE */
private static final int PEEKED_NONE = 0;
/** @see JsonReader#PEEKED_SINGLE_QUOTED */
private static final int PEEKED_SINGLE_QUOTED = 8;
/** @see JsonReader#PEEKED_DOUBLE_QUOTED */
private static final int PEEKED_DOUBLE_QUOTED = 9;
// Here is a bunch of spies made to "spy" for the parent's class state
private final FieldSpy<Integer> peeked;
private final MethodSpy<Integer> doPeek;
private final MethodSpy<Integer> getLineNumber;
private final MethodSpy<Integer> getColumnNumber;
private final FieldSpy<char[]> buffer;
private final FieldSpy<Integer> pos;
private final FieldSpy<Integer> limit;
private final MethodSpy<Character> readEscapeCharacter;
private final FieldSpy<Integer> lineNumber;
private final FieldSpy<Integer> lineStart;
private final MethodSpy<Boolean> fillBuffer;
private final MethodSpy<IOException> syntaxError;
private final FieldSpy<Integer> stackSize;
private final FieldSpy<int[]> pathIndices;
private EnhancedJsonReader(final Reader reader)
throws NoSuchFieldException, NoSuchMethodException {
super(reader);
peeked = spyField(JsonReader.class, this, "peeked");
doPeek = spyMethod(JsonReader.class, this, "doPeek");
getLineNumber = spyMethod(JsonReader.class, this, "getLineNumber");
getColumnNumber = spyMethod(JsonReader.class, this, "getColumnNumber");
buffer = spyField(JsonReader.class, this, "buffer");
pos = spyField(JsonReader.class, this, "pos");
limit = spyField(JsonReader.class, this, "limit");
readEscapeCharacter = spyMethod(JsonReader.class, this, "readEscapeCharacter");
lineNumber = spyField(JsonReader.class, this, "lineNumber");
lineStart = spyField(JsonReader.class, this, "lineStart");
fillBuffer = spyMethod(JsonReader.class, this, "fillBuffer", int.class);
syntaxError = spyMethod(JsonReader.class, this, "syntaxError", String.class);
stackSize = spyField(JsonReader.class, this, "stackSize");
pathIndices = spyField(JsonReader.class, this, "pathIndices");
}
static EnhancedJsonReader getEnhancedGson25JsonReader(final Reader reader) {
try {
return new EnhancedJsonReader(reader);
} catch ( final NoSuchFieldException | NoSuchMethodException ex ) {
throw new RuntimeException(ex);
}
}
// This method has been copied and reworked from the nextString() implementation
void nextSlicedString(final ISlicedStringListener listener)
throws IOException {
int p = peeked.get();
if ( p == PEEKED_NONE ) {
p = doPeek.get();
}
switch ( p ) {
case PEEKED_SINGLE_QUOTED:
nextQuotedSlicedValue('\'', listener);
break;
case PEEKED_DOUBLE_QUOTED:
nextQuotedSlicedValue('"', listener);
break;
default:
throw new IllegalStateException("Expected a string but was " + peek()
+ " at line " + getLineNumber.get()
+ " column " + getColumnNumber.get()
+ " path " + getPath()
);
}
peeked.accept(PEEKED_NONE);
pathIndices.get()[stackSize.get() - 1]++;
}
// The following method is also a copy-paste that was patched for the "spies".
// It's, in principle, the same as the source one, but it has one more buffer singleCharBuffer
// in order not to add another method to the ISlicedStringListener interface (enjoy lamdbas as much as possible).
// Note that the main difference between these two methods is that this one
// does not aggregate a single string value, but just delegates the internal
// buffers to call-sites, so the latter ones might do anything with the buffers.
/**
* @see JsonReader#nextQuotedValue(char)
*/
private void nextQuotedSlicedValue(final char quote, final ISlicedStringListener listener)
throws IOException {
final char[] buffer = this.buffer.get();
final char[] singleCharBuffer = new char[1];
while ( true ) {
int p = pos.get();
int l = limit.get();
int start = p;
while ( p < l ) {
final int c = buffer[p++];
if ( c == quote ) {
pos.accept(p);
listener.accept(buffer, start, p - start - 1);
return;
} else if ( c == '\\' ) {
pos.accept(p);
listener.accept(buffer, start, p - start - 1);
singleCharBuffer[0] = readEscapeCharacter.get();
listener.accept(singleCharBuffer, 0, 1);
p = pos.get();
l = limit.get();
start = p;
} else if ( c == '\n' ) {
lineNumber.accept(lineNumber.get() + 1);
lineStart.accept(p);
}
}
listener.accept(buffer, start, p - start);
pos.accept(p);
if ( !fillBuffer.apply(just1) ) {
throw syntaxError.apply(justUnterminatedString);
}
}
}
// Save some memory
private static final Object[] just1 = { 1 };
private static final Object[] justUnterminatedString = { "Unterminated string" };
}
FieldSpy.java
final class FieldSpy<T>
implements Supplier<T>, Consumer<T> {
private final Object instance;
private final Field field;
private FieldSpy(final Object instance, final Field field) {
this.instance = instance;
this.field = field;
}
static <T> FieldSpy<T> spyField(final Class<?> declaringClass, final Object instance, final String fieldName)
throws NoSuchFieldException {
final Field field = declaringClass.getDeclaredField(fieldName);
field.setAccessible(true);
return new FieldSpy<>(instance, field);
}
@Override
public T get() {
try {
@SuppressWarnings("unchecked")
final T value = (T) field.get(instance);
return value;
} catch ( final IllegalAccessException ex ) {
throw new RuntimeException(ex);
}
}
@Override
public void accept(final T value) {
try {
field.set(instance, value);
} catch ( final IllegalAccessException ex ) {
throw new RuntimeException(ex);
}
}
}
MethodSpy.java
final class MethodSpy<T>
implements Function<Object[], T>, Supplier<T> {
private static final Object[] emptyObjectArray = {};
private final Object instance;
private final Method method;
private MethodSpy(final Object instance, final Method method) {
this.instance = instance;
this.method = method;
}
static <T> MethodSpy<T> spyMethod(final Class<?> declaringClass, final Object instance, final String methodName, final Class<?>... parameterTypes)
throws NoSuchMethodException {
final Method method = declaringClass.getDeclaredMethod(methodName, parameterTypes);
method.setAccessible(true);
return new MethodSpy<>(instance, method);
}
@Override
public T get() {
// my javac generates useless new Object[0] if no args passed
return apply(emptyObjectArray);
}
@Override
public T apply(final Object[] arguments) {
try {
@SuppressWarnings("unchecked")
final T value = (T) method.invoke(instance, arguments);
return value;
} catch ( final IllegalAccessException | InvocationTargetException ex ) {
throw new RuntimeException(ex);
}
}
}
HugeJsonReaderDemo.java
And here is a demo that uses that method to read a huge JSON and redirect its string values to a another file.
public static void main(final String... args)
throws IOException {
try ( final EnhancedGson25JsonReader input = getEnhancedGson25JsonReader(new InputStreamReader(new FileInputStream("./huge.json")));
final Writer output = new OutputStreamWriter(new BufferedOutputStream(new FileOutputStream("./huge.json.STRINGS"))) ) {
while ( input.hasNext() ) {
final JsonToken token = input.peek();
switch ( token ) {
case BEGIN_OBJECT:
input.beginObject();
break;
case NAME:
input.nextName();
break;
case STRING:
input.nextSlicedString(output::write);
break;
default:
throw new AssertionError(token);
}
}
}
}
I successfully extracted the fields above to a file. The input file was 544 MB (570 425 371 bytes) length and generated out of the following JSON chunks:
{"result":"OK","base64":"
JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC
× 16777216 (2^24)"}
And the result is (since I just redirect all strings to the file):
OK
JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC
× 16777216 (2^24)
I think that you faced with a very interesting issue. It would be nice to have some feedback from the GSON team on possible API enhancement.
来源:https://stackoverflow.com/questions/39615673/best-way-to-handle-huge-fields-with-gson-jsonreader