ImportError: cannot import name PunktWordTokenizer

喜你入骨 提交于 2019-12-06 00:05:49

There appears to be a regression related to PunktWordTokenizer in 3.0.2. The issue was not present in 3.0.1, rolling back to that version or earlier fixes the issue.

>>> import nltk
>>> nltk.__version__
'3.0.2'
>>> from nltk.tokenize import PunktWordTokenizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name PunktWordTokenizer

For solving this Try pip install -U nltk to upgrade your NLTK version.

PunktWordTokenizer was previously exposed to user but not any more. You can rather use WordPunctTokenizer.

from nltk.tokenize import WordPunctTokenizer
WordPunctTokenizer().tokenize(“text to tokenize”)

The difference is :

PunktWordTokenizer splits on punctuation, but keeps it with the word. Where as WordPunctTokenizer splits all punctuations into separate tokens.

For example, given Input: This’s a test

PunktWordTokenizer: [‘This’, “‘s”, ‘a’, ‘test’]
WordPunctTokenizer: [‘This’, “‘”, ‘s’, ‘a’, ‘test’]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!