We wrote the simplest possible TCP server (with minor logging) to examine the memory footprint (see tcp-server.go below)
The server simply accepts connections and do
The answer is unfortunately pretty simple, goroutine stacks can't currently be released.
Since you're connecting 10k clients at once, you need 10k goroutines to handle them. Each goroutine has an 8k stack, and even if only the first page is faulted in, you still need at least 40M of permanent memory to handle your max connections.
There are some pending changes that may help in go1.4 (like 4k stacks), but it's a fact we have to live with for now.