NLTK: set proxy server | 易学教程

问题

I'm trying to learn NLTK - Natural Language Toolkit written in Python and I want install a sample data set to run some examples.

My web connection uses a proxy server, and I'm trying to specify the proxy address as follows:

>>> nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))
>>> nltk.download()

But I get an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable

I decided to set up a ProxyBasicAuthHandler before calling nltk.download():

import urllib2

auth_handler = urllib2.ProxyBasicAuthHandler(urllib2.HTTPPasswordMgrWithDefaultRealm())
auth_handler.add_password(realm=None, uri='http://proxy.example.com:3128/', user='USERNAME', passwd='PASSWORD')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)

import nltk
nltk.download()

But now I get HTTP Error 407 - Proxy Autentification Required.

The documentation says that if the proxy is set to None then this function will attempt to detect the system proxy. But it isn't working.

How can I install a sample data set for NLTK?

回答1:

There is an error with the website where you got those lines of code for your first attempt (I have seen that same error)

The line in error is

nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))

You need a comma to separate the arguments. The correct line should be

nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))

This will work just fine.

回答2:

I was too getting the same error but i got a perfectly working solution.You need to download the nltk_data MANUALLY and put it in usr/lib/nltk_data directory in linux and c:\nltk_data if you use windows .
Here are the steps you need to follow :
1.Download the nltk_data zip file from this Github link
https://github.com/nltk/nltk_data/tree/gh-pages .
2.Since data is in zip form you need to extract it .
3.Specially for ubuntu users , following command to navigate the filesystem in a handy way.
sudo nautilus it makes copy/paste process handy . Now you can copy to usr/share easily or create a folder easily .
4.Now if you are a linux user than create a folder named as nltk_data in usr/share and if you use windows than create the same in c:/ .
5.Now paste all content of nltk_data-gh-pages (which you just extracted ) in nltk_data folder you just created .
6. Now form nltk_data/packages folder copy all folder and paste it to nltk_data folder. Now you are done.

Since this is my first answer i might be not able to explain the process correctly . So if you have trouble going through these steps , please do comment .

回答3:

I run NLTK 3.2.5 and python 3.6 under Windows 10 environment. I use this script :

nltk.set_proxy('http://user:password@proxy.example.com:3128')
nltk.download()

回答4:

The options suggested above did not work for me. Here's what worked for me in my windows environment. Try removing the round braces . it works now !

nltk.set_proxy('http://proxy.example.com:3128', 'USERNAME', 'PASSWORD')

回答5:

I run NLTK 3.0 and python 3.4 in windows environment..and proxy authentication runs well if i remove the branch.. so use this script

nltk.set_proxy('http://proxy.example.com:3128', 'username', 'password')

回答6:

Set the proxy of the system in bash also by changing proper environment variable.

Some of the proxy settings which I keep are:

http_proxy=http://127.0.0.1:3129/
ftp_proxy=http://127.0.0.1:3129/
all_proxy=socks://127.0.0.1:3129/
https_proxy=http://127.0.0.1:3129/

You can make the changes in environment variable permanent by editing your ~/.bashrc file. Sample edit:

export http_proxy=http://127.0.0.1:3129/

回答7:

If you want to manually install NLTK Corpus.

1) Go to http://www.nltk.org/nltk_data/ and download your desired NLTK Corpus file.

2) Now in a Python shell check the value of nltk.data.path

3) Choose one of the path that exists on your machine, and unzip the data files into the corpora sub directory inside.

4) Now you can import the data from nltk.corpos import stopwords

Reference: https://medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9

回答8:

To be honest, the accepted solution doesn't work for me. And I'm also afraid of leaking my password since we need to specify it explicitly.

Rather than use nltk.download() inside python console, run python -m nltk.downloader all in cmd (for Windows) works super for me!

ps: For Windows user, remember to turn of your Proxy server before running the command. Go to Internet Explorer -> gear icon at the top right -> Internet Options -> Connections -> LAN settings -> uncheck "User a proxy server ... VPN connections)." -> OK

Resource is also from the official document: https://www.nltk.org/data.html#command-line-installation

回答9:

I could make it work with:

nltk.set_proxy('http://user_name:password@proxy_ip_adress:3128')

来源：https://stackoverflow.com/questions/13908615/nltk-set-proxy-server

标签

python

nltk

proxy-server