match POS tag and sequence of words

后端 未结 5 1732
我寻月下人不归
我寻月下人不归 2020-12-22 06:30

I have the following two strings with their POS tags:

Sent1: \"something like how writer pro or phraseology works would be really cool.\"<

相关标签:
5条回答
  • 2020-12-22 07:12

    Check StackOverflow Link

    from nltk.tokenize import word_tokenize
    def would_be(tagged):
        return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
    
    S = "more options like the syntax editor would be nice."  
    pos = nltk.pos_tag(word_tokenize(S)) 
    would_be(pos)   
    

    Also check code

    from nltk.tokenize import word_tokenize
    import nltk
    def checkTag(S):
        pos = nltk.pos_tag(word_tokenize(S))
        flag = 0
        for tag in pos:
            if tag[1] == 'JJ':
                flag = 1
        if flag:
            for ind,tag in enumerate(pos):
                if tag[0] == 'would' and pos[ind+1][0] == 'be':
                        return True
            return False
        return False
    
    S = "something like how writer pro or phraseology works would be really cool."
    print checkTag(S)
    
    0 讨论(0)
  • 2020-12-22 07:13

    First install the nltk_cli as per the instructions: https://github.com/alvations/nltk_cli

    Then, here's a secret function in nltk_cli, maybe you'll find it useful:

    alvas@ubi:~/git/nltk_cli$ cat infile.txt 
    something like how writer pro or phraseology works would be really cool .
    more options like the syntax editor would be nice
    alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+ADJP infile.txt 
    would be    really cool
    would be    nice
    

    To illustrate other possible usage:

    alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+VP infile.txt 
    !!! NO CHUNK of VP+VP in this sentence !!!
    !!! NO CHUNK of VP+VP in this sentence !!!
    alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 NP+VP infile.txt 
    how writer pro or phraseology works would be
    the syntax editor   would be
    alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+NP infile.txt 
    !!! NO CHUNK of VP+NP in this sentence !!!
    !!! NO CHUNK of VP+NP in this sentence !!!
    

    Then if you want to check if the phrase in sentence and output True/False, simply read and iterate through the outputs from nltk_cli and check with if-else conditions.

    0 讨论(0)
  • 2020-12-22 07:22

    Would this help?

    s1=[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer', 'NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works', 'NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool', 'JJ'), ('.', '.')]
    
    flag = True
    for i,j in zip(s1[:-1],s1[1:]):
        if i[0]+" "+j[0] == "would be":
            flag = True
        if flag and (i[-1] == "JJ" or j[-1] == "JJ"):
            print "would be adjective found in the tagged string"
    
    0 讨论(0)
  • 2020-12-22 07:25

    it seem you would just search consecutive tags for "would" followed by "be" and then for the first instance of tag "JJ". Something like this:

    import nltk
    
    def has_would_be_adj(S):
        # make pos tags
        pos = nltk.pos_tag(S.split())
        # Search consecutive tags for "would", "be"
        j = None  # index of found "would"
        for i, (x, y) in enumerate(zip(pos[:-1], pos[1:])):
            if x[0] == "would" and y[0] == "be":
                j = i
                break
        if j is None or len(pos) < j + 2:
            return False
        a = None  # index of found adjective
        for i, (word, tag) in enumerate(pos[j + 2:]):
            if tag == "JJ":
                a = i+j+2 #
                break
        if a is None:
            return False
        print("Found adjective {} at {}", pos[a], a)
        return True
    
    S = "something like how writer pro or phraseology works would be really cool."
    print(has_would_be_adj(S))
    

    I'm sure this could be written compacter and cleaner but it does what it says on the box :)

    0 讨论(0)
  • 2020-12-22 07:32
    from itertools import tee,izip,dropwhile
    import nltk
    def check_sentence(S):
        def pairwise(iterable):
            "s -> (s0,s1), (s1,s2), (s2, s3), ..."
            a, b = tee(iterable)
            next(b, None)
            return izip(a, b)
    
    
        def consecutive_would_be(word_group):
            first, second = word_group
            (would_word, _) = first
            (be_word, _) = second
            return would_word.lower() != "would" && be_word.lower() != "be"
    
    
        for word_groups in dropwhile(consecutive_would_be, pairwise(nltk.pos_tag(nltk.word_tokenize(S))):
            first, second = word_groups
            (_, pos1) = first
            (_, pos2) = second
            if pos1 == "JJ" || pos2 == "JJ":
                return True
        return False
    

    and then you can use the function like so:

    S = "more options like the syntax editor would be nice."  
    check_sentence(S)
    
    0 讨论(0)
提交回复
热议问题