Using NLTK corpora with AWS Lambda functions in Python

后端 未结 4 1796
遥遥无期
遥遥无期 2021-01-02 02:20

I\'m encountering a difficulty when using NLTK corpora (in particular stop words) in AWS Lambda. I\'m aware that the corpora need to be downloaded and have done so with NLTK

4条回答
  •  长发绾君心
    2021-01-02 02:56

    on AWS Lambda you need to include nltk python package with lambda and modify data.py:

    path += [
        str('/usr/share/nltk_data'),
        str('/usr/local/share/nltk_data'),
        str('/usr/lib/nltk_data'),
        str('/usr/local/lib/nltk_data')
    ]
    

    to

    path += [
        str('/var/task/nltk_data')
        #str('/usr/share/nltk_data'),
        #str('/usr/local/share/nltk_data'),
        #str('/usr/lib/nltk_data'),
        #str('/usr/local/lib/nltk_data')
    ]
    

    You cant include the entire nltk_data directory, delete all the zip files, and if you only need stopwords, save nltk_data -> corpora -> stopwords and dump the rest. If you need tokenizers save nltk_data -> tokenizers -> punkt. To download the nltk_data folder use anaconda Jupyter notebook and run

    nltk.download()

    or

    https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip

    or

    python -m nltk.downloader all
    

提交回复
热议问题