Java Stanford NLP: Part of Speech labels?

前端 未结 10 2106
借酒劲吻你
借酒劲吻你 2020-11-27 08:59

The Stanford NLP, demo\'d here, gives an output like this:

Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.

What do the Part of S

10条回答
  •  天涯浪人
    2020-11-27 09:41

    In spacy it was very fast i think, in just a low-end notebook it will run like this :

    import spacy
    import time
    
    start = time.time()
    
    with open('d:/dictionary/e-store.txt') as f:
        input = f.read()
    
    word = 0
    result = []
    
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(input)
    
    for token in doc:
        if token.pos_ == "NOUN":
            result.append(token.text)
        word += 1
    
    elapsed = time.time() - start
    
    print("From", word, "words, there is", len(result), "NOUN found in", elapsed, "seconds")
    

    The Output in several trial :

    From 3547 words, there is 913 NOUN found in 7.768507719039917 seconds
    From 3547 words, there is 913 NOUN found in 7.408619403839111 seconds
    From 3547 words, there is 913 NOUN found in 7.431427955627441 seconds
    

    So, I think you don't need to worry about the looping for each POS tag check :)

    More improvement I got when disabled certain pipeline :

    nlp = spacy.load("en_core_web_sm", disable = 'ner')
    

    So, The result is faster :

    From 3547 words, there is 913 NOUN found in 6.212834596633911 seconds
    From 3547 words, there is 913 NOUN found in 6.257707595825195 seconds
    From 3547 words, there is 913 NOUN found in 6.371225833892822 seconds
    

提交回复
热议问题