问题
If I have a string such as this:
text = "They refuse to permit us."
txt = nltk.word_tokenize(text)
With this if I print POS tags; nltk.pos_tag(txt)
I get
[('They','PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP')]
How can I print out only this:
['PRP', 'VBP', 'TO', 'VB', 'PRP']
回答1:
You got a list of tuples, you should iterate through it to get only the second element of each tuple.
>>> tagged = nltk.pos_tag(txt)
>>> tags = [ e[1] for e in tagged]
>>> tags
['PRP', 'VBP', 'TO', 'VB', 'PRP']
回答2:
Take a look at Unpacking a list / tuple of pairs into two lists / tuples
>>> from nltk import pos_tag, word_tokenize
>>> text = "They refuse to permit us."
>>> tagged_text = pos_tag(word_tokenize(text))
>>> tokens, pos = zip(*tagged_text)
>>> pos
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.')
Possibly at some point you will find the POS tagger is slow and you will need to do this (see Slow performance of POS tagging. Can I do some kind of pre-warming?):
>>> from nltk import pos_tag, word_tokenize
>>> from nltk.tag import PerceptronTagger
>>> tagger = PerceptronTagger()
>>> text = "They refuse to permit us."
>>> tagged_text = tagger.tag(word_tokenize(text))
>>> tokens, pos = zip(*tagged_text)
>>> pos
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.')
回答3:
You can iterate like -
print [x[1] for x in nltk.pos_tag(txt)]
来源:https://stackoverflow.com/questions/34609285/how-to-print-out-tags-in-python