My Code:
import nltk.data
tokenizer = nltk.data.load(\'nltk:tokenizers/punkt/english.pickle\')
ERROR Message:
[ec2-user@ip-
After adding this line of code, the issue will be fixed:
nltk.download('punkt')
Simple nltk.download() will not solve this issue. I tried the below and it worked for me:
in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.
This will work.! the folder structure needs to be as shown in the picture
Execute the following code:
import nltk
nltk.download()
After this, NLTK downloader will pop out.
From the shell you can execute:
sudo python -m nltk.downloader punkt
If you want to install the popular NLTK corpora/models:
sudo python -m nltk.downloader popular
If you want to install all NLTK corpora/models:
sudo python -m nltk.downloader all
To list the resources you have downloaded:
python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'
import nltk
nltk.download('punkt')
Open the Python prompt and run the above statements.
The sent_tokenize function uses an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module. This instance has already been trained and works well for many European languages. So it knows what punctuation and characters mark the end of a sentence and the beginning of a new sentence.