This is assuming you really want to save every micro-second. Most applications don't have such strict requirements.
If you want to save micro-seconds, you will want to use busy waiting non-blocking NIO for threads on dedicated cpus. This doesn't scale well as you need to have plenty of CPU but does minimise the latency for handling IO. I suggest you also bind the isolated CPUs to minimise jitter.
You will want to avoid using Selectors as they block and/or create quite a bit of garbage adding to GC pauses.
Also to minimise latency you will want to use a low latency, kernel bypass network adapter such as Solarflare.
You will want to use a push parser so long messages can be decoded/parsed as they download. i.e. you won't want to wait until the whole messages is received before starting.
Using these tricks in combination can save 10 - 30 micro-seconds off every request or inbound event.
Netty is a better solution for scalability ie, higher net throughput, but at a small cost to latency, as do most frameworks which are based on support web services where milli-seconds delays are tolerable.