I have web application written in Flask. As suggested by everyone, I can't use Flask in production. So I thought of Gunicorn with Flask.
In Flask application I am loading some Machine Learning models. These are of size 8GB collectively. Concurrency of my web application can go upto 1000 requests. And the RAM of machine is 15GB.
So what is the best way to run this application?
You can start your app with multiple workers or async workers with Gunicorn.
Flask server.py
from flask import Flask app = Flask(__name__) @app.route("/") def hello(): return "Hello World!" if __name__ == "__main__": app.run()
Gunicorn with gevent async worker
gunicorn server:app -k gevent --worker-connections 1000
Gunicorn 1 worker 12 threads:
gunicorn server:app -w 1 --threads 12
Gunicorn with 4 workers (multiprocessing):
gunicorn server:app -w 4
More information on Flask concurrency in this post: How many concurrent requests does a single Flask process receive?.