How to extract nouns using NLTK pos_tag()?

问题

I am fairly new to python. I am not able to figure out the bug. I want to extract nouns using NLTK.

I have written the following code:

import nltk

sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"

tokens = nltk.word_tokenize(sentence)

tagged = nltk.pos_tag(tokens)


length = len(tagged) - 1

a = list()

for i in (0,length):
    log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

When I run this, 'a' only has one element

a
['detail']

I do not understand why?

When I do it without for loop, that is running

log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

by change value of 'i' manually from 0 to 'length', i get the output perfectly, but with for loop it only returns the end element. Can someone tell me what is wrong happening with for loop.

'a' should be as follows after the code

['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram' 'Aaron', 'design']

回答1:

for i in (0,length):

This iterates over two elements, zero and length. If you want to iterate over every number between zero and length, use range.

for i in range(0, length):

Better yet, it's more idiomatic to directly iterate over the elements of a sequence, rather than its index. This will reduce the likelihood of typos like the one above.

for item in tagged:
    if item[1][0] == 'N':
      a.append(item[0])

Size-conscious users may even prefer the one line list comprehension:

a = [item[0] for item in tagged if item[1][0] == 'N']

回答2:

>>> from nltk import word_tokenize, pos_tag
>>> sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"
>>> nouns = [token for token, pos in pos_tag(word_tokenize(sentence)) if pos.startswith('N')]
>>> nouns
['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram', 'Aaron', 'design']

回答3:

This line will only loop twice

for i in (0,length):

Once with i = 0 and once with i = length

What you want is

for i in range(length):

回答4:

Try This

import nltk

sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"

tokens = nltk.word_tokenize(sentence)

tagged = nltk.pos_tag(tokens)

length = len(tagged) - 1

a = list()

for i in range(0, length):
    log = (tagged [i][1][0] == 'N')
    if log == True:
        a.append(tagged [i][0])
print a

来源：https://stackoverflow.com/questions/24409642/how-to-extract-nouns-using-nltk-pos-tag

标签

python

nlp

nltk