how to write spacy matcher of POS regex

后端 未结 2 1156
情歌与酒
情歌与酒 2021-01-13 01:32

Spacy has two features I\'d like to combine - part-of-speech (POS) and rule-based matching.

How can I combine them in a neat way?

For example - let\'s say i

2条回答
  •  感动是毒
    2021-01-13 02:17

    Eyal Shulman's answer was helpful, but it makes you hard code a pattern matcher, not exactly use a regular expression.

    I wanted to use regular expressions, so I made my own solution:

        pattern = r'()*()*()*()+()*' 
        ## create a string with the pos of the sentence
        posString = ""
        for w in doc[start:end].sent:
            posString += "<" + w.pos_ + ">"
    
        lstVerb = []
        for m in re.compile(pattern).finditer(posString):
            ## each m is a verb phrase match
            ## count the "<" in m to find how many tokens we want
            numTokensInGroup = m.group().count('<')
    
            ## then find the number of tokens that came before that group.
            numTokensBeforeGroup = posString[:m.start()].count('<') 
    
            verbPhrase = sentence[numTokensBeforeGroup:numTokensBeforeGroup+numTokensInGroup]
            ## starting at character offset m.start()
            lstVerb.append(verbPhrase)
    

提交回复
热议问题