What would cause WordNetCorpusReader to have no attribute LazyCorpusLoader?

前端 未结 1 381
青春惊慌失措
青春惊慌失措 2020-12-18 20:56

I\'ve got a short function to check whether a word is a real word by comparing it to the WordNet corpus from the Natural Language Toolkit. I\'m calling this function from a

相关标签:
1条回答
  • 2020-12-18 21:13

    I have run your code and get the same error. For a working solution, see below. Here is the explanation:

    LazyCorpusLoader is a proxy object that stands in for a corpus object before the corpus is loaded. (This prevents the NLTK from loading massive corpora into memory before you need them.) The first time this proxy object is accessed, however, it becomes the corpus you intend to load. That is to say, the LazyCorpusLoader proxy object transforms its __dict__ and __class__ into the __dict__ and __class__ of the corpus you are loading.

    If you compare your code to your errors above, you can see that you received 9 errors when you tried to create 10 instances of your class. The first transformation of the LazyCorpusLoader proxy object into a WordNetCorpusReader object was successful. This action was triggered when you accessed wordnet for the first time:

    The First Thread

    from nltk.corpus import wordnet as wn
    def is_good_word(word):
        ...
        wn.ensure_loaded()  # `LazyCorpusLoader` conversion into `WordNetCorpusReader` starts
    

    The Second Thread

    When you begin to run your is_good_word function in a second thread, however, your first thread has not completely transformed the LazyCorpusLoader proxy object into a WordNetCorpusReader. wn is still a LazyCorpusLoader proxy object, so it begins the __load process again. Once it gets to the point where it tries to convert its __class__ and __dict__ into a WordNetCorpusReader object, however, the first thread has already converted the LazyCorpusLoader proxy object into a WordNetCorpusReader. My guess is that you are running into an error in the line with my comment below:

    class LazyCorpusLoader(object):
        ...
        def __load(self):
            ...
            corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)  # load corpus
            ...
            # self.__args == self._LazyCorpusLoader__args
            args, kwargs  = self.__args, self.__kwargs                       # most likely the line throwing the error
    

    Once the first thread has transformed the LazyCorpusLoader proxy object into a WordNetCorpusReader object, the mangled names will no longer work. The WordNetCorpusReader object will not have LazyCorpusLoader anywhere in its mangled names. (self.__args is equivalent to self._LazyCorpusLoader__args while the object is a LazyCorpusLoader object.) Thus you get the following error:

    AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
    

    An Alternative

    In light of this issue, you will want to access the wn object before you enter into your threading. Here is your code modified appropriately:

    from nltk.corpus import wordnet as wn
    from nltk.corpus import stopwords
    from nltk.corpus.reader.wordnet import WordNetError
    import sys
    import time
    import threading
    
    cachedStopWords = stopwords.words("english")
    
    
    def is_good_word(word):
        word = word.strip()
        if len(word) <= 2:
            return 0
        if word in cachedStopWords:
            return 0
        try:
            if len(wn.lemmas(str(word), lang='en')) == 0:     # no longer the first access of wn
                return 0
        except WordNetError as e:
            print("WordNetError on concept {}".format(word))
        except AttributeError as e:
            print("Attribute error on concept {}: {}".format(word, e.message))
        except:
            print("Unexpected error on concept {}: {}".format(word, sys.exc_info()[0]))
        else:
            return 1
        return 1
    
    
    class ProcessMetaThread(threading.Thread):
        def __init__(self):
            threading.Thread.__init__(self)
    
        def run(self):
            is_good_word('dog')
    
    
    def process_meta(numberOfThreads):
        print wn.__class__            # <class 'nltk.corpus.util.LazyCorpusLoader'>
        wn.ensure_loaded()            # first access to wn transforms it
        print wn.__class__            # <class 'nltk.corpus.reader.wordnet.WordNetCorpusReader'>
        threadsList = []
        for i in range(numberOfThreads):
            start = time.clock()
            t = ProcessMetaThread()
            print time.clock() - start
            t.setDaemon(True)
            t.start()
            threadsList.append(t)
    
        numComplete = 0
        while numComplete < numberOfThreads:
            # Iterate over the active processes
            for processNum in range(0, numberOfThreads):
                # If a process actually exists
                if threadsList != None:
                    # If the process is finished
                    if not threadsList[processNum] == None:
                        if not threadsList[processNum].is_alive():
                            numComplete += 1
                            threadsList[processNum] = None
            time.sleep(5)
    
        print('Processes Finished')
    
    
    if __name__ == '__main__':
        process_meta(10)
    

    I have tested the above code and received no errors.

    0 讨论(0)
提交回复
热议问题