Unable to download nltk data

倖福魔咒の 提交于 2019-12-04 22:35:11

问题


import nltk
nltk.download()

It shows [SSL:CERTIFICATE_VERIFY_FAILED]. In case of requests one can use verify=False, but what to do here.

UPDATE:

This error persists on Python 3.6, with NLTK 3.0, on Mac OS X 10.7.5:

Changing the index in the NLTK downloader (suggested here) allows the downloader to show all of NLTK's files, but when one tries to download all, one gets another SSL error (see bottom of photo):


回答1:


I had the same problem when trying to configure both nltk and SpaCy. Per the instructions in this question, I was able to overcome the issue. Try running /Applications/Python\ 3.6/Install\ Certificates.command, then retry your NLTK download




回答2:


On MacOS 10.12.6 this was solved by entering the following in the bash terminal

pip install certifi
/Applications/Python\ 3.6/Install\ Certificates.command

the usual method of installing nltk corpora then worked for me

import nltk
nltk.download()



回答3:


If you want to download manually, for example you need tokenizer/punkt data, you can download directly to :

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip

and place the punkt extracted folder in C:\nltk_data\tokenizers.




回答4:


(Adding "certificate verify failed _ssl.c:749" for SEO of this issue.)

Solved for me on Mac, 10.12.2 by using Paul Barry's tip of downloading via Python 2.7 (I can't comment because rep < 50)

Additional problems encountered and fixed: To be able to download NLTK via python 2.7 (the default Mac Python 2.7 setup) I also had to add the Python folder to the /.bash_profile as this comment shows.

Then, since I had set this path variable for 2.7, I had to remove it once the corpora were downloaded to be able to start python3. So remove it from /.bash_profile before starting python3.

After all that, I can run "import nltk" and "from nltk.book import *" without issues.




回答5:


OK, it's a bit of a hack, but here's what I had to do to be able to use the various NLTK data files in Python 3.x on my Mac laptop (running macOS 10.12.2).

Firstly, note that the certificate error only occurs when I try to download NLTK data using Python 3.x on my Mac (my Ubuntu VM inside of VirtualBox had no such error when using Python 3.x - which is annoying). Just why this causes an error on my Mac is beyond me, especially as the NLTK module installs into Python 3.x using pip with no issues. It's the connection to NLTK's download server which appears to cause the SSL verification issue.

My 'ah ha!' moment came when I realised that NLTK - when installed into Python 3.x or Python 2.x - shares the same directory structure among all the versions of Python installed on any computer. So, I used the Python 2.x which comes pre-installed on macOS to install NLTK, then used nltk.download() within Python 2.x to install the stopwords corpus with no issues. Having done this (in Python 2.x), I then went back into Python 3.x, and this code worked:

import nltk
from nltk.corpus import stopwords
print(stopwords.words('english'))

As I said, it's a bit of a hack, but this technique lets me get the NLTK data installed using Python 2.x, which I can them process with Python 3.x as required.



来源:https://stackoverflow.com/questions/38725583/unable-to-download-nltk-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!