I\'m encountering a difficulty when using NLTK corpora (in particular stop words) in AWS Lambda. I\'m aware that the corpora need to be downloaded and have done so with NLTK
Another solution is to use Lambda's ephemeral storage at the location /tmp
So, you would have something like this:
import nltk
import json
from nltk.tokenize import word_tokenize
nltk.data.path.append("/tmp")
nltk.download("punkt", download_dir = "/tmp")
At runtime punkt will download to the /tmp directory, which is writable. However, this likely isn't a great solution if you have huge concurrency.