Is there a more efficient way of doing this? My code reads a text file and extracts all Nouns.
import nltk
File = open(fileName) #open file
lines = File.rea
import nltk
lines = 'lines is some string of words'
# function to test if something is a noun
is_noun = lambda pos: pos[:2] == 'NN'
# do the nlp stuff
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)]
print nouns
>>> ['lines', 'string', 'words']
Useful tip: it is often the case that list comprehensions are a faster method of building a list than adding elements to a list with the .insert() or append() method, within a 'for' loop.