How to extract nouns using NLTK pos_tag()?

本小妞迷上赌 提交于 2019-12-18 04:21:29

问题


I am fairly new to python. I am not able to figure out the bug. I want to extract nouns using NLTK.

I have written the following code:

import nltk

sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"

tokens = nltk.word_tokenize(sentence)

tagged = nltk.pos_tag(tokens)


length = len(tagged) - 1

a = list()

for i in (0,length):
    log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

When I run this, 'a' only has one element

a
['detail']

I do not understand why?

When I do it without for loop, that is running

log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

by change value of 'i' manually from 0 to 'length', i get the output perfectly, but with for loop it only returns the end element. Can someone tell me what is wrong happening with for loop.

'a' should be as follows after the code

['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram' 'Aaron', 'design']

回答1:


for i in (0,length):

This iterates over two elements, zero and length. If you want to iterate over every number between zero and length, use range.

for i in range(0, length):

Better yet, it's more idiomatic to directly iterate over the elements of a sequence, rather than its index. This will reduce the likelihood of typos like the one above.

for item in tagged:
    if item[1][0] == 'N':
      a.append(item[0])

Size-conscious users may even prefer the one line list comprehension:

a = [item[0] for item in tagged if item[1][0] == 'N']



回答2:


>>> from nltk import word_tokenize, pos_tag
>>> sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"
>>> nouns = [token for token, pos in pos_tag(word_tokenize(sentence)) if pos.startswith('N')]
>>> nouns
['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram', 'Aaron', 'design']



回答3:


This line will only loop twice

for i in (0,length):

Once with i = 0 and once with i = length

What you want is

for i in range(length):



回答4:


Try This

import nltk

sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"

tokens = nltk.word_tokenize(sentence)

tagged = nltk.pos_tag(tokens)

length = len(tagged) - 1

a = list()

for i in range(0, length):
    log = (tagged [i][1][0] == 'N')
    if log == True:
        a.append(tagged [i][0])
print a


来源:https://stackoverflow.com/questions/24409642/how-to-extract-nouns-using-nltk-pos-tag

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!