Using NLTK corpora with AWS Lambda functions in Python

后端未结

关注

 4  1796

遥遥无期 2021-01-02 02:20

I\'m encountering a difficulty when using NLTK corpora (in particular stop words) in AWS Lambda. I\'m aware that the corpora need to be downloaded and have done so with NLTK

4条回答

长发绾君心 (楼主)

2021-01-02 02:56
on AWS Lambda you need to include nltk python package with lambda and modify data.py:
```
path += [
    str('/usr/share/nltk_data'),
    str('/usr/local/share/nltk_data'),
    str('/usr/lib/nltk_data'),
    str('/usr/local/lib/nltk_data')
]
```
to
```
path += [
    str('/var/task/nltk_data')
    #str('/usr/share/nltk_data'),
    #str('/usr/local/share/nltk_data'),
    #str('/usr/lib/nltk_data'),
    #str('/usr/local/lib/nltk_data')
]
```
You cant include the entire nltk_data directory, delete all the zip files, and if you only need stopwords, save nltk_data -> corpora -> stopwords and dump the rest. If you need tokenizers save nltk_data -> tokenizers -> punkt. To download the nltk_data folder use anaconda Jupyter notebook and run

nltk.download()

or

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip

or
```
python -m nltk.downloader all
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...