问题
Any way to import Python's nltk.download('punkt') into Google Cloud Functions? I've found that adding the statement manually into my code block in main.py significantly slows down my function processing, since punkt has to be downloaded every time it is run. Is there any method to eliminate this by calling punkt in some other way?
EDIT#1:- I edited my code and program structure to match what Barak suggested, but I keep getting the same error:
Error: function terminated. Recommended action: inspect logs for termination reason. Details:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mtokenizers/punkt/PY3/english.pickle[0m
Searched in:
- '/tmp/nltk_data'
- '/env/nltk_data'
- '/env/share/nltk_data'
- '/env/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
回答1:
Take a look at the instructions for uploading files with your Cloud function. Specifically since you can upload files, you can then modify nltk to just use these files:
Following the official NLTK documentation, you can "Set your NLTK_DATA environment variable to point to your top level nltk_data folder."
Combining these together, you'd get:
- Download the data (on your computer) with
python -m nltk.downloader punkt
- Upload the NLTK directory (find it's path on your computer in the above documentation) as an
nltk_data
directory, created at the root of your function environment Configure the code to find that folder:
import os root = os.path.dirname(path.abspath(__file__)) nltk_dir = os.path.join(root, 'nltk_data') # Your folder name here os.environ['NLTK_DATA'] = nltk_dir
EDIT: Seems as if path export with the environment variable doesn't achieve the desired effect, so let's have the path explicit in the code
On your computer download the data
import os download_dir = os.path.abspath('my_nltk_dir') os.makedirs(download_dir) import nltk nltk.download('punkt', download_dir=download_dir)
Add the directory
my_nltk_dir
to be in the same folder of your python script. This would bePROJECT_ROOT/ |-- my_code.py |-- my_nltk_dir/ |-- ...
In your code refer to the data using
import ntlk.data root = os.path.dirname(path.abspath(__file__)) download_dir = os.path.join(root, 'my_nltk_dir') nltk.data.load( os.path.join(download_dir, 'tokenizers/punkt/english.pickle') )
回答2:
Add nltk to your requirements.txt;
Install nltk on your local machine, if you haven't already:
pip install nltk
Then download the nltk_data files. In my case for tokenizers, I needed the Punkt tokenizer module:
python -m nltk.downloader punkt
Copy them (they're inside Roaming/ for Windows) to your root folder (i.e. together with your functions):
cp -r C:\Users\<USER>\AppData\Roaming\nltk_data\* YOUR\ROOT\FOLDER\nltk_data\
At the beginning of your main python function, or just before using nltk, add the following code--Basically, it grabs the path where nltk_data is, and tells nltk to look inside this folder:
root = os.path.dirname(os.path.abspath(__file__))
download_dir = os.path.join(root, 'nltk_data')
os.chdir(download_dir)
nltk.data.path.append(download_dir)
Finally, after committing/pushing (if you're using Cloud Source Repos), (re)deploy your function!
来源:https://stackoverflow.com/questions/62209018/any-way-to-import-pythons-nltk-downloadpunkt-into-google-cloud-functions