Heroku: deploying Deep Learning model

问题

I have developed a rest API using Flask to expose a Python Keras Deep Learning model (CNN for text classification). I have a very simple script that loads the model into memory and outputs class probabilities for a given text input. The API works perfectly locally.

However, when I git push heroku master, I get Compiled slug size: 588.2M is too large (max is 500M). The model is 83MB in size, which is quite small for a Deep Learning model. Notable dependencies include Keras and its tensorflow backend.

I know that you can use GBs of RAM and disk space on Heroku. But the bottleneck seems to be the slug size. Is there a way to circumvent this? Or is Heroku just not the right tool for deploying Deep Learning models?

回答1:

The first thing I would check, as suggested by others, is to find out why your repo is so big given that the model size is only 83MB.

Given that you cannot reduce the size there is the option of offloading parts of the repo, but to do this you will still need an idea of which files are taking up the space. Offloading is suggested in the heroku docs. Slug size is limited to 500MB as stated here: https://devcenter.heroku.com/articles/slug-compiler#slug-size and I believe this has to do with the time it takes to spin up a new instance if a change in resources is needed. However, you can use offloading if you have particularly large files. More info on offloading here: https://devcenter.heroku.com/articles/s3

回答2:

This answer assumes that your model is only 83MB and the total size of your repository directory is smaller (likely much smaller) than 500MB.

There could be a few issues, but the obvious thing you need to do is reduce your git repository to less than 500MB.

First, try commands like the following to reduce the size of your repo (see this blog post for reference):

heroku plugins:install heroku-repo
heroku repo:gc --app your-app-name
heroku repo:purge_cache --app your-app-name

These might solve your issue.

Another potential issue is that you have at some point committed another (large size) model and removed it from your repo in a subsequent commit. The git repo now includes a version of that model in your .git folder and git history. There are a few fixes for this, but if you don't need your commit history you can copy the repo to another folder and create a fresh git repo with git init. Commit everything with something like "Initial commit" and then try pushing this repo with only one commit to Heroku. Likely that will be a much smaller repo size.

回答3:

A lot of these answers are great for reducing slug size but if anyone still has problems with deploying a deep learning model to heroku it is important to note that for whatever reason tensorflow 2.0 is ~500MB whereas earlier versions are much smaller. Using an earlier version of tensorflow can greatly reduce your slug size.

回答4:

As a resource, you can visit the Heroku Slug Compiler help page.

Having an 83MB model size doesn't mean that it is 83MB all the way. Since packages are being compiled when being pushed to Heroku, this will obviously eat up more slug space so that the packages can be ready for use by the application. The best solution is probably to put large assets to a container like AWS S3 or any other good counterpart. Or worst is to use a different cloud service.

回答5:

I would say that Heroku is not the right tool for deploying the deep learning model itself. For that, you could consider using a Platform as a Service dedicated to Deep Learning, such as Floydhub. You could deploy your Flask REST API on Floydhub too.