We have developed a NIO networking library that performs under 2 microseconds over loopback without producing any garbage for the GC. As Peter Lawrey mentioned, the native JDK selector produces a lot of garbage but we have fixed all these garbage leaks by implementing our own epoll selector. Busy waiting the selector thread is great for latency but there must be a balance not to burn the chip or consume a lot of energy. Our selector implementation use low-level tricks to implement a kind of energy saving mode that takes care of that balance.
Besides CoralReactor, you can also take a look on Grizzly and Mina, but we haven't played with these frameworks yet.
For some Netty TCP performance benchmarks you can take a look here.