I was experimenting with concurrent request handling on few platforms.
The aim of the experiment was to have a broad measure of the capacity bounds of s
To make optimal usage in terms of minimizing costs you need to configure few things in app.yaml:
threadsafe: true - actually it's from Python config and not applicable to Go but I would set it just in case.max_concurrent_requests - set to maximum 80max_idle_instances - set to minimum 0max_pending_latency - set it to automatic or greater then min_pending_latencymin_idle_instances - set it to 0min_pending_latency - set to higher number. If you are OK to get 1 second latency and you handlers take on average 100ms to process set it to 900ms.Then you should be able to proceed a lot of request on single instance.
If you OK to burn cash for the sake of responsiveness & scalabiluty - increase min_idle_instances & max_idle_instances.
Also do you use similar instance types for VM and GAE? The GAE F1 instance is not too fast and is more optimal for async tasks like working with IO (datastore,http,etc.). You can configure usage of more powerful instance to better scale for computation intensive tasks.
Also do you test on paid account? Free accounts have quotas and AppEngine would refuse percentage of requests if it believe the load would exceed the daily quota if continuous with the same pattern.