NLTK v3.2: Unable to nltk.pos_tag()

好久不见. 提交于 2019-11-27 09:46:07
MananVyas

EDITED

This issue has been resolved from NLTK v3.2.1. Upgrading your NLTK version would resolve the issue, e.g. pip install -U nltk.


I faced the same issue and the error encountered was as follows;

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
  File "C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\perceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
  File "C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
  File "C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\data.py", line 924, in _open
return urlopen(resource_url)
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 391, in open
response = self._open(req, data)
  File "C:\Python27\lib\urllib2.py", line 414, in _open
'unknown_open', req)
  File "C:\Python27\lib\urllib2.py", line 369, in _call_chain
result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 1206, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>

The URLError that you mentioned was due to a bug in the perceptron.py file within the NLTK library for Windows. In my machine, the file is at this location

C:\Python27\Lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\perceptron.py

(Basically look at an equivalent location within yours wherever you have the Python27 folder)

The bug was basically in the code to find the corresponding location for the averaged_perceptron_tagger within your machine. One can have a look at the line 801 and 924 mentioned in the data.py file regarding this.

I think the NLTK developer community recently fixed this bug in the code. Have a look at this commit made to their code a few days back.

https://github.com/nltk/nltk/commit/d3de14e58215beebdccc7b76c044109f6197d1d9#diff-26b258372e0d13c2543de8dbb1841252

The snippet where the change was made is as follows;

self.tagdict = {}
self.classes = set()
    if load:
        AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
          self.load(AP_MODEL_LOC)
        # Initially it was:AP_MODEL_LOC = str(find('taggers/averaged_perceptron_tagger/'+PICKLE)) 

def tag(self, tokens):

Updating the file to the most recent commit worked for me and was able to use the nltk.pos_tag command. I believe this would resolve your problem as well (assuming you have everything else set up).

alvas

EDITED

This issue has been resolved from NLTK v3.2.1. Please upgrade your NLTK!


First read @MananVyas answer for the why:

https://stackoverflow.com/a/35902494/610569


Here's the how, without downgrading to NLTK v3.1, using NLTK 3.2, you can use this "hack":

>>> from nltk.tag import PerceptronTagger
>>> from nltk.data import find
>>> PICKLE = "averaged_perceptron_tagger.pickle"
>>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
>>> tagger = PerceptronTagger(load=False)
>>> tagger.load(AP_MODEL_LOC)
>>> pos_tag = tagger.tag
>>> pos_tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
Sarim Hussain

I faced the same issue a while back. Solution:

nltk.download('averaged_perceptron_tagger')
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!