Resource u'tokenizers/punkt/english.pickle' not found

前端 未结 17 2044
粉色の甜心
粉色の甜心 2020-12-13 01:49

My Code:

import nltk.data
tokenizer = nltk.data.load(\'nltk:tokenizers/punkt/english.pickle\')

ERROR Message:

[ec2-user@ip-         


        
相关标签:
17条回答
  • 2020-12-13 02:29

    After adding this line of code, the issue will be fixed:

    nltk.download('punkt')
    
    0 讨论(0)
  • 2020-12-13 02:31

    Simple nltk.download() will not solve this issue. I tried the below and it worked for me:

    in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.

    This will work.! the folder structure needs to be as shown in the picture

    0 讨论(0)
  • 2020-12-13 02:32
    1. Execute the following code:

      import nltk
      nltk.download()
      
    2. After this, NLTK downloader will pop out.

    3. Select All packages.
    4. Download punkt.
    0 讨论(0)
  • 2020-12-13 02:38

    From the shell you can execute:

    sudo python -m nltk.downloader punkt 
    

    If you want to install the popular NLTK corpora/models:

    sudo python -m nltk.downloader popular
    

    If you want to install all NLTK corpora/models:

    sudo python -m nltk.downloader all
    

    To list the resources you have downloaded:

    python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
    python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'
    
    0 讨论(0)
  • 2020-12-13 02:38
    import nltk
    nltk.download('punkt')
    

    Open the Python prompt and run the above statements.

    The sent_tokenize function uses an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module. This instance has already been trained and works well for many European languages. So it knows what punctuation and characters mark the end of a sentence and the beginning of a new sentence.

    0 讨论(0)
提交回复
热议问题