问题
I'm trying to install NLTK for Python 3.4. The actual NLTK module appears to have installed fine. I then ran
import nltk
nltk.download()
and chose to download everything. However, after it was done, the window simply says 'out of date'. I tried refreshing and downloading, yet it stays 'out of date' as shown here:NLTK Window 1
I looked online and tried various fixes, but I haven't found any that helped my case yet.
I also tried to manually find the missing parts, which turned out to be 'Open Multilingual Wordnet' and 'Wordnet'. Here's how I found which parts were missing: Open Multilingual Wordnet.
What should I do? Should I uninstall and reinstall NLTK? I haven't really found a way to delete the packages (except for manually deleting it).
EDIT: Regarding Solution 2 and Solution 3: For more clarification on the Solution 2 issue:
If something has sucessfully downloaded, this is the output:
>>> nltk.download('subjectivity')
[nltk_data] Downloading package subjectivity to
[nltk_data] C:\Users\Shane\AppData\Roaming\nltk_data...
[nltk_data] Package subjectivity is already up-to-date!
True
However, for 'wordnet' and 'omw', this is what happens when I redownload:
>>> nltk.download('omw')
[nltk_data] Downloading package omw to
[nltk_data] C:\Users\Shane\AppData\Roaming\nltk_data...
[nltk_data] Unzipping corpora\omw.zip.
True
回答1:
In short:
Don't use the GUI, add all packages within the python interpreter.
$ python3
>>> import nltk
>>> nltk.download('all')
In long:
It might be because of the recent addition of Open Multilingual WordNet
and something is not working right with the NLTK download GUI interface and the indices.
Solution 1:
Simply use the nltk.download()
GUI and download the two packages without selecting all. (May not work but worth the try)
Solution 2:
Install the package individually through the python interpreter:
>>> import nltk
>>> nltk.download('wordnet')
>>> nltk.download('omw') # Open Multilingual WordNet
Solution 3:
Let the nltk.download('all')
check through all packages in its index and download them if they're not available.
>>> import nltk
>>> nltk.downlad('all')
Note: If any files was corrupted possibly due to broken internet connection, simply find the directory where NLTK data is stored and then proceed with solution 3.
To find where nltk_data
is stored, nltk.data.path
stores the possible locations:
>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']
Since the point of the data download is to use them, to know that you're not missing the components you need, and if that's wordnet
and omw
, you can try this:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0]
Synset('bank.n.01')
>>> wn.synsets('bank')[0].lemma_names('spa')
['margen', 'orilla', 'vera']
>>> wn.synsets('bank')[0].lemma_names('fre')
['rive', 'banque']
Don't worry so much as in what is shown on the GUI. Once nltk.download('all')
is completed without errors, it means you have all the corpora and models that NLTK supports.
But as a good practice, please raise an issue in https://github.com/nltk/nltk_data/issues so that the developers can check if the problem can be replicated. Show some more printscreen of the error. before and after the proposed solutions too =)
回答2:
Don't worry about the "out of date" messages, it's a waste of your time. Just go ahead and use the nltk.
The NLTK's data resources are almost entirely independent of each other. You might never have reason to use either of the packages that are marked as "out of date", but even if you do, chances are they are in fact fully installed and usable.
Still, it's happened to me too and this is what I found: It seems that the downloader will consider a resource to be "out of date" if it detects files in its download folder that are not in the resource manifest. Perhaps this is sometimes caused by misconfigured resources, but if you've visited the resources in question with a directory browser, you may have caused the mismatch through stray files left behind by your GUI, or your editor, or who knows what. E.g., on a Mac the Finder will leave a .DS_Store
file in directories it visits.
But as I said, the "problem" is not really worth fixing. Enjoy the NLTK!
PS. As far as I know, the best (and really only) way to refresh your nltk_data
directory is to delete the whole thing and download again.
来源:https://stackoverflow.com/questions/33183618/nltk-data-out-of-date-python-3-4