I was having an issue where some sites were taking a long time to load (By \"long time\" I mean up to 16 seconds). Sometimes they might timeout
Regarding:
Regarding your answer to 5, I believe what Gunicorn recommends is overkill.
I recently performed some ad-hoc testing with the number of workers and found that, assuming you have enough RAM, that that 2*cores+1 rule of thumb is pretty accurate. I found that requests/sec increased almost linearly until I got close to that number, then dropped off as the OS started to thrash.
Since results depend greatly on workload, try different values and see where your performance peaks.