detecting POS tag pattern along with specified words

家住魔仙堡 提交于 2019-12-11 12:52:43

问题


I need to identify certain POS tags before/after certain specified words, for example the following tagged sentence:

[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]

can be abstracted to the form "would be" + Adjective

Similarly:

[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

is of the form "am able to" + Verb

How can I go about checking for these type of a pattern in sentences. I am using NLTK.


回答1:


Assuming you want to check literally for "would" followed by "be", followed by some adjective, you can do this:

def would_be(tagged):
    return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

The input is a POS tagged sentence (list of tuples, as per NLTK).

It checks if there are any three elements in the list such that "would" is next to "be" and "be" is next to a word tagged as an adjective ('JJ'). It will return True as soon as this "pattern" is matched.

You can do something very similar for the second type of sentence:

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

Here's a driver for the program:

s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

def would_be(tagged):
   return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))

This correctly outputs:

Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True

If you'd like to generalize this, you can change whether you're checking the literal words or their POS tag.



来源:https://stackoverflow.com/questions/34672986/detecting-pos-tag-pattern-along-with-specified-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!