ImportError: cannot import name PunktWordTokenizer

I was trying to use PunktWordTokenizer and it was occurred an error as below.

from nltk.tokenize.punkt import PunktWordTokenizer

And this gave the following error message.

Traceback (most recent call last): File "file", line 5, in <module>
from nltk.tokenize.punkt import PunktWordTokenizer ImportError: cannot import name PunktWordTokenizer

I've checked that nltk is installed and that PunkWordTokenzer is also installed using nltk.download(). Need some help for this.

There appears to be a regression related to PunktWordTokenizer in 3.0.2. The issue was not present in 3.0.1, rolling back to that version or earlier fixes the issue.

>>> import nltk
>>> nltk.__version__
'3.0.2'
>>> from nltk.tokenize import PunktWordTokenizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name PunktWordTokenizer

For solving this Try pip install -U nltk to upgrade your NLTK version.

PunktWordTokenizer was previously exposed to user but not any more. You can rather use WordPunctTokenizer.

from nltk.tokenize import WordPunctTokenizer
WordPunctTokenizer().tokenize(“text to tokenize”)

The difference is :

PunktWordTokenizer splits on punctuation, but keeps it with the word. Where as WordPunctTokenizer splits all punctuations into separate tokens.

For example, given Input: This’s a test

PunktWordTokenizer: [‘This’, “‘s”, ‘a’, ‘test’]
WordPunctTokenizer: [‘This’, “‘”, ‘s’, ‘a’, ‘test’]

来源：https://stackoverflow.com/questions/44238864/importerror-cannot-import-name-punktwordtokenizer

标签

python

nltk

importerror