Here are some things I have found which could be significant.
- JSON-RPC can generate a lot of objects. Not as much as XML-RPC, but still something to watch for. In any case you do appear to be generating as much at 100 MB of objects per second which means your GC is running a high percentage of the time and is likely to be adding to your random latency. Even though the GC is concurrent, your hardware/OS is very likely to exhibit non-ideal random latency under load.
- Have a look at your memory bank architecture. On Linux the command is numactl --hardware. If your VM is being split across more than one memory bank this will increase your GC times significantly. (It will also slow down your application as these accesses can be significantly less efficient) The harder you work the memory subsystem the more likely the OS will have to shift memory around (Often in large amounts) and you get dramatic pauses as a result (100 ms is not surprising). Don't forget your OS does more than just run your app.
- Consider compacting/reducing the memory consumption of your cache. If you are using multiple GB of cache it is worth looking at ways to cut memory consumption further than you have already.
- I suggest you profile your app with memory allocation tracing AND cpu sampling on at the same time. This can yield very different results and often points to the cause of these sort of problems.
Using these approaches, the latency of an RPC call can be reduced to below 200 micro-second and the GC times reduced to 1-3 ms effecting less than 1/300 of calls.