Extracting all Nouns from a text file using nltk

前端 未结 7 1839
清歌不尽
清歌不尽 2020-12-08 08:35

Is there a more efficient way of doing this? My code reads a text file and extracts all Nouns.

import nltk

File = open(fileName) #open file
lines = File.rea         


        
相关标签:
7条回答
  • 2020-12-08 09:10
    import nltk
    
    lines = 'lines is some string of words'
    # function to test if something is a noun
    is_noun = lambda pos: pos[:2] == 'NN'
    # do the nlp stuff
    tokenized = nltk.word_tokenize(lines)
    nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)] 
    
    print nouns
    >>> ['lines', 'string', 'words']
    

    Useful tip: it is often the case that list comprehensions are a faster method of building a list than adding elements to a list with the .insert() or append() method, within a 'for' loop.

    0 讨论(0)
提交回复
热议问题