How to install NLTK modules in Heroku

有些话、适合烂在心里 提交于 2020-12-29 09:33:08

问题


Hey i'd like to install the NLTK pos_tag on my Heroku server. How can i do so. Please give me the steps as im new to the Heroku server system.


回答1:


I just added official nltk support to the buildpack!

Simply add a nltk.txt file with a list of corpora you want installed, and everything should work as expected.




回答2:


Update

As Kenneth Reitz pointed out, a much simpler solution has been added to the heroku-python-buildpack. Add a nltk.txt file to your root directory and list your corpora inside. See https://devcenter.heroku.com/articles/python-nltk for details.


Original Answer

Here's a solution that allows you to install the NLTK data directly on Heroku without adding it to your git repo.

I used similar steps to install Textblob on Heroku, which uses NLTK as a dependency. I've made some minor adjustments to my original code in steps 3 and 4 that should work for an NLTK only installation.

The default heroku buildpack includes a post_compile step that runs after all of the default build steps have been completed:

# post_compile
#!/usr/bin/env bash

if [ -f bin/post_compile ]; then
    echo "-----> Running post-compile hook"
    chmod +x bin/post_compile
    sub-env bin/post_compile
fi

As you can see, it looks in your project directory for your own post_compile file in the bin directory, and it runs it if it exists. You can use this hook to install the nltk data.

  1. Create the bin directory in the root of your local project.

  2. Add your own post_compile file to the bin directory.

    # bin/post_compile
    #!/usr/bin/env bash
    
    if [ -f bin/install_nltk_data ]; then
        echo "-----> Running install_nltk_data"
        chmod +x bin/install_nltk_data
        bin/install_nltk_data
    fi
    
    echo "-----> Post-compile done"
    
  3. Add your own install_nltk_data file to the bin directory.

    # bin/install_nltk_data
    #!/usr/bin/env bash
    
    source $BIN_DIR/utils
    
    echo "-----> Starting nltk data installation"
    
    # Assumes NLTK_DATA environment variable is already set
    # $ heroku config:set NLTK_DATA='/app/nltk_data'
    
    # Install the nltk data
    # NOTE: The following command installs the averaged_perceptron_tagger corpora, 
    # so you may want to change for your specific needs.  
    # See http://www.nltk.org/data.html
    python -m nltk.downloader averaged_perceptron_tagger
    
    # If using Textblob, use this instead:
    # python -m textblob.download_corpora lite
    
    # Open the NLTK_DATA directory
    cd ${NLTK_DATA}
    
    # Delete all of the zip files
    find . -name "*.zip" -type f -delete
    
    echo "-----> Finished nltk data installation"
    
  4. Add nltk to your requirements.txt file (Or textblob if you are using Textblob).

  5. Commit all of these changes to your repo.

  6. Set the NLTK_DATA environment variable on your heroku app.

    $ heroku config:set NLTK_DATA='/app/nltk_data'
    
  7. Deploy to Heroku. You will see the post_compile step trigger at the end of the deployment, followed by the nltk download.

I hope you found this helpful! Enjoy!




回答3:


Follow the instructions on Installing NLTK on Heroku or look into heroku-buildpack-python-sklearn repository




回答4:


If you want to use simple functionalities like pos_tag, tokenizer, stemming, etc. then you can do the following steps

  1. mention nltk in requirements.txt
  2. mention following modules in nltk.txt
    • wordnet
    • pros_cons
    • reuters
    • hmm_treebank_pos_tagger
    • maxent_treebank_pos_tagger
    • universal_tagset
    • punkt
    • averaged_perceptron_tagger_ru
    • averaged_perceptron_tagger
    • snowball_data
    • rslp
    • porter_test
    • vader_lexicon
    • treebank
    • dependency_treebank



回答5:


You need to follow the below steps.

  1. nltk.txt needs to present at the root folder
  2. Add the modules you want to download like punkt, stopwords as separate row items
  3. Change the line ending from windows to UNIX

Changing the line ending is a very important step. Can be easily done through Sublime Text or Notepad++. In Sublime Text, it can done from the View menu, then Line Endings.

Hope this helps



来源:https://stackoverflow.com/questions/18385303/how-to-install-nltk-modules-in-heroku

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!