How to improve performance of deserializing objects from HttpsURLConnection.getInputStream()?

问题

I have a client-server application where the server sends some binary data to the client and the client has to deserialize objects from that byte stream according to a custom binary format. The data is sent via an HTTPS connection and the client uses HttpsURLConnection.getInputStream() to read it.

I implemented a DataDeserializer that takes an InputStream and deserializes it completely. It works in a way that it performs multiple inputStream.read(buffer) calls with small buffers (usually less than 100 bytes). On my way of achieving better overall performance I also tried different implementations here. One change did improve this class' performance significantly (I'm using a ByteBuffer now to read primitive types rather than doing it manually with byte shifting), but in combination with the network stream no differences show up. See the section below for more details.

Quick summary of my issue

Deserializing from the network stream takes way too long even though I proved that the network and the deserializer themselves are fast. Are there any common performance tricks that I could try? I am already wrapping the network stream with a BufferedInputStream. Also, I tried double buffering with some success (see code below). Any solution to achieve better performance is welcome.

The performance test scenario

In my test scenario server and client are located on the same machine and the server sends ~174 MB of data. The code snippets can be found at the end of this post. All numbers you see here are averages of 5 test runs.

First I wanted to know, how fast that InputStream of the HttpsURLConnection can be read. Wrapped into a BufferedInputStream, it took 26.250s to write the entire data into a ByteArrayOutputStream.¹

Then I tested the performance of my deserializer passing it all that 174 MB as a ByteArrayInputStream. Before I improved the deserializer's implementation, it took 38.151s. After the improvement it took only 23.466s.² So this is going to be it, I thought... but no.

What I actually want to do, somehow, is passing connection.getInputStream() to the deserializer. And here comes the strange thing: Before the deserializer improvement deserializing took 61.413s and after improving it was 60.100s!³

How can that happen? Almost no improvement here despite the deserializer improved significantly. Also, unrelated to that improvement, I was surprised that this takes longer than the separate performances summed up (60.100 > 26.250 + 23.466). Why? Don't get me wrong, I didn't expect this to be the best solution, but I didn't expect it to be that bad either.

So, three things to notice:

The overall speed is bound by the network which takes at least 26.250s. Maybe there are some http-settings that I could tweak or I could further optimize the server, but for now this is likely not what I should focus on.
My deserializer implementation is very likely still not perfect, but on its own it is faster than the network, so I don't think there is need to further improve it.
Based on 1. and 2. I'm assuming that it should be somehow possible to do the entire job in a combined way (reading from the network + deserializing) which should take not much more than 26.250s. Any suggestions on how to achieve this are welcome.

I was looking for some kind of double buffer allowing two threads to read from it and write to it in parallel. Is there something like that in standard Java? Preferably some class inheriting from InputStream that allows to write to it in parallel? If there is something similar, but not inheriting from InputStream, I may be able to change my DataDeserializer to consume from that one as well.

As I haven't found any such DoubleBufferInputStream, I implemented it myself. The code is quite long and likely not perfect and I don't want to bother you to do a code review for me. It has two 16kB buffers. Using it I was able to improve the overall performance to 39.885s.⁴ That is much better than 60.100s but still much worse than 26.250s. Choosing different buffer sizes didn't change much. So, I hope someone can lead me to some good double buffer implementation.

The test code

1 (26.250s)

InputStream inputStream = new BufferedInputStream(connection.getInputStream());
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

byte[] buffer = new byte[16 * 1024];
int count = 0;

long start = System.nanoTime();
while ((count = inputStream.read(buffer)) >= 0) {
    outputStream .write(buffer, 0, count);
}
long end = System.nanoTime();

2 (23.466s)

InputStream inputStream = new ByteArrayInputStream(entire174MBbuffer);
DataDeserializer deserializer = new DataDeserializer(inputStream);

long start = System.nanoTime();
deserializer.Deserialize();
long end = System.nanoTime();

3 (60.100s)

InputStream inputStream = new BufferedInputStream(connection.getInputStream());
DataDeserializer deserializer = new DataDeserializer(inputStream);

long start = System.nanoTime();
deserializer.Deserialize();
long end = System.nanoTime();

4 (39.885s)

MyDoubleBufferInputStream doubleBufferInputStream = new MyDoubleBufferInputStream();

new Thread(new Runnable() {

    @Override
    public void run() {

        try (InputStream inputStream = new BufferedInputStream(connection.getInputStream())) {
            byte[] buffer = new byte[16 * 1024];
            int count = 0;
            while ((count = inputStream.read(buffer)) >= 0) {
                doubleBufferInputStream.write(buffer, 0, count);
            }
        } catch (IOException e) {
        } finally {
            doubleBufferInputStream.closeWriting(); // read() may return -1 now
        }
    }

}).start();

DataDeserializer deserializer = new DataDeserializer(doubleBufferInputStream);
long start = System.nanoTime();
deserializer.deserialize();
long end = System.nanoTime();

Update

As requested, here is the core of my deserializer. I think the most important method is prepareForRead() which performs the actual reading of the stream.

class DataDeserializer {
    private InputStream _stream;
    private ByteBuffer _buffer;

    public DataDeserializer(InputStream stream) {
        _stream = stream;
        _buffer = ByteBuffer.allocate(256 * 1024);
        _buffer.order(ByteOrder.LITTLE_ENDIAN);
        _buffer.flip();
    }

    private int readInt() throws IOException {
        prepareForRead(4);
        return _buffer.getInt();
    }
    private long readLong() throws IOException {
        prepareForRead(8);
        return _buffer.getLong();
    }
    private CustomObject readCustomObject() throws IOException {
        prepareForRead(/*size of CustomObject*/);
        int customMember1 = _buffer.getInt();
        long customMember2 = _buffer.getLong();
        // ...
        return new CustomObject(customMember1, customMember2, ...);
    }
    // several other built-in and custom object read methods

    private void prepareForRead(int count) throws IOException {
        while (_buffer.remaining() < count) {
            if (_buffer.capacity() - _buffer.limit() < count) {
                _buffer.compact();
                _buffer.flip();
            }

            int read = _stream.read(_buffer.array(), _buffer.limit(), _buffer.capacity() - _buffer.limit());
            if (read < 0)
                throw new EOFException("Unexpected end of stream.");

            _buffer.limit(_buffer.limit() + read);
        }
    }

    public HugeCustomObject Deserialize() throws IOException {
        while (...) {
            // call several of the above methods
        }
        return new HugeCustomObject(/* deserialized members */);
    }
}

Update 2

I modified my code snippet #1 a little bit to see more precisely where time is being spent:

InputStream inputStream = new BufferedInputStream(connection.getInputStream());
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[16 * 1024];

long read = 0;
long write = 0;
while (true) {
    long t1 = System.nanoTime();
    int count = istream.read(buffer);
    long t2 = System.nanoTime();
    read += t2 - t1;
    if (count < 0)
        break;
    t1 = System.nanoTime();
    ostream.write(buffer, 0, count);
    t2 = System.nanoTime();
    write += t2 - t1;
}
System.out.println(read + " " + write);

This tells me that reading from the network stream takes 25.756s while writing to the ByteArrayOutputStream only takes 0.817s. This makes sense as these two numbers almost perfectly sum up to the previously measured 26.250s (plus some additional measuring overhead).

In the very same way I modified code snippet #4:

MyDoubleBufferInputStream doubleBufferInputStream = new MyDoubleBufferInputStream();

new Thread(new Runnable() {

    @Override
    public void run() {
        try (InputStream inputStream = new BufferedInputStream(httpChannelOutputStream.getConnection().getInputStream(), 256 * 1024)) {
            byte[] buffer = new byte[16 * 1024];

            long read = 0;
            long write = 0;
            while (true) {
                long t1 = System.nanoTime();
                int count = inputStream.read(buffer);
                long t2 = System.nanoTime();
                read += t2 - t1;
                if (count < 0)
                    break;
                t1 = System.nanoTime();
                doubleBufferInputStream.write(buffer, 0, count);
                t2 = System.nanoTime();
                write += t2 - t1;
            }
            System.out.println(read + " " + write);
        } catch (IOException e) {
        } finally {
            doubleBufferInputStream.closeWriting();
        }
    }

}).start();

DataDeserializer deserializer = new DataDeserializer(doubleBufferInputStream);
deserializer.deserialize();

Now I would expect that the measured reading time is exactly the same as in the previous example. But instead, the read variable holds a value of 39.294s (How is that possible?? It's the exact same code being measured as in the previous example with 25.756s!)^* while writing to my double buffer only takes 0.096s. Again, these numbers almost perfectly sum up to the measured time of code snippet #4. Additionally, I profiled this very same code using Java VisualVM. That tells me that 40s were spent in this thread's run() method and 100% of these 40s are CPU time. On the other hand, it also spends 40s inside of the deserializer, but here only 26s are CPU time and 14s are spent waiting. This perfectly matches the time of reading from network into ByteBufferOutputStream. So I guess I have to improve my double buffer's "buffer-switching-algorithm".

*) Is there any explanation for this strange observation? I could only imagine that this way of measuring is very inaccurate. However, the read- and write-times of the latest measurements perfectly sum up to the original measurement, so it cannot be that inaccurate... Could someone please shed some light on this? I was not able to find these read and write performances in the profiler... I will try to find some settings that allow me to observe the profiling results for these two methods.

回答1:

Apparently, my "mistake" was to use a 32-bit JVM (jre1.8.0_172 being precise). Running the very same code snippets on a 64-bit version JVM, and tadaaa... it is fast and makes all sense there.

In particular see these new numbers for the corresponding code snippets:

snippet #1: 4.667s (vs. 26.250s)
snippet #2: 11.568s (vs. 23.466s)
snippet #3: 17.185s (vs. 60.100s)
snippet #4: 12.336s (vs. 39.885s)

So apparently, the answers given to Does Java 64 bit perform better than the 32-bit version? are simply not true anymore. Or, there is a serious bug in this particular 32-bit JRE version. I didn't test any others yet.

As you can see, #4 is only slightly slower than #2 which perfectly matches my original assumption that

Based on 1. and 2. I'm assuming that it should be somehow possible to do the entire job in a combined way (reading from the network + deserializing) which should take not much more than 26.250s.

Also the very weird results of my profiling approach described in Update 2 of my question do not occur anymore. I didn't repeat every single test in 64 bit yet, but all profiling results that I did do are plausible now, i.e. the same code takes the same time no matter in which code snippet. So maybe it's really a bug, or does anybody have a reasonable explanation?

回答2:

The most certain way to improve any of these is to change

connection.getInputStream()

new BufferedInputStream(connection.getInputStream())

If that doesn't help, the input stream isn't your problem.

来源：https://stackoverflow.com/questions/51020591/how-to-improve-performance-of-deserializing-objects-from-httpsurlconnection-geti

标签

java

performance

inputstream