My Code:
import nltk.data
tokenizer = nltk.data.load(\'nltk:tokenizers/punkt/english.pickle\')
ERROR Message:
[ec2-user@ip-
Go to python console by typing
$ python
in your terminal. Then, type the following 2 commands in your python shell to install the respective packages:
>> nltk.download('punkt') >> nltk.download('averaged_perceptron_tagger')
This solved the issue for me.
I faced same issue. After downloading everything, still 'punkt' error was there. I searched package on my windows machine at C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers and I can see 'punkt.zip' present there. I realized that somehow the zip has not been extracted into C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers\punk. Once I extracted the zip, it worked like music.
You need to rearrange your folders
Move your tokenizers
folder into nltk_data
folder.
This doesn't work if you have nltk_data
folder containing corpora
folder containing tokenizers
folder
If you're looking to only download the punkt
model:
import nltk
nltk.download('punkt')
If you're unsure which data/model you need, you can install the popular datasets, models and taggers from NLTK:
import nltk
nltk.download('popular')
With the above command, there is no need to use the GUI to download the datasets.
For me nothing of the above worked, so I just downloaded all the files by hand from the web site http://www.nltk.org/nltk_data/ and I put them also by hand in a file "tokenizers" inside of "nltk_data" folder. Not a pretty solution but still a solution.
To add to alvas' answer, you can download only the punkt
corpus:
nltk.download('punkt')
Downloading all
sounds like overkill to me. Unless that's what you want.