Better way to use SpaCy to parse sentences?

问题

I'm using SpaCy to find sentences that contain 'is' or 'was' that have pronouns as their subjects and return the object of the sentence. My code works, but I feel like there must be a much better way to do this.

import spacy
nlp = spacy.load('en_core_web_sm')

ex_phrase = nlp("He was a genius. I really liked working with him. He is a dog owner. She is very kind to animals.")


#create an empty list to hold any instance of this particular construction
list_of_responses = []

#split into sentences
for sent in ex_phrase.sents:
    for token in sent:
        #check to see if the word 'was' or 'is' is in each sentence, if so, make a list of the verb's constituents
        if token.text == 'was' or token.text == 'is':
            dependency = [child for child in token.children]
            #if the first constituent is a pronoun, make sent_object equal to the item at index 1 in the list of constituents
            if dependency[0].pos_ == 'PRON':
                sent_object = dependency[1]

    #create a string of the entire object of the verb. For instance, if sent_object = 'genius', this would create a string 'a genius'
    for token in sent:
        if token == sent_object:
            whole_constituent = [t.text for t in token.subtree]
            whole_constituent = " ".join(whole_constituent)

    #check to see what the pronoun was, and depending on if it was 'he' or 'she', construct a coherent followup sentence
    if dependency[0].text.lower() == 'he':
        returning_phrase = f"Why do you think him being {whole_constituent} helped the two of you get along?"
    elif dependency[0].text.lower() == 'she':
        returning_phrase = f"Why do you think her being {whole_constituent} helped the two of you get along?"

    #add each followup sentence to the list. For some reason it creates a lot of duplicates, so I have to use set
    list_of_responses.append(returning_phrase)
    list_of_responses = list(set(list_of_responses))

回答1:

It seems like your code is trying to do something more complicated than what you describe in your question. I have tried to do what it looks like you want to do with your code. Getting the object/attribute of a verb "is" or "was" is just part of this.

import spacy
from pprint import pprint

nlp = spacy.load('en')

text = "He was a genius. I really liked working with him. He is a dog owner. She is very kind to animals."

def get_pro_nsubj(token):
    # get the (lowercased) subject pronoun if there is one
    return [child.lower_ for child in token.children if child.dep_ == 'nsubj'][0]

list_of_responses = []

# a mapping of subject to object pronouns
subj_obj_pro_map = {'he': 'him',
                    'she': 'her'
                    }

for token in nlp(text):
    if token.pos_ in ['NOUN', 'ADJ']:
        if token.dep_ in ['attr', 'acomp'] and token.head.lower_ in ['is', 'was']:
            # to test for lemma 'be' use token.head.lemma_ == 'be'
            nsubj = get_pro_nsubj(token.head)
            if nsubj in ['he', 'she']:
                # get the text of each token in the constituent and join it all together
                whole_constituent = ' '.join([t.text for t in token.subtree])
                obj_pro = subj_obj_pro_map[nsubj] # convert subject to object pronoun
                returning_phrase = 'Why do you think {} being {} helped the two of you get along?'.format(obj_pro, whole_constituent)
                list_of_responses.append(returning_phrase)

pprint(list_of_responses)

Which outputs this:

['Why do you think him being a genius helped the two of you get along?',
 'Why do you think him being a dog owner helped the two of you get along?',
 'Why do you think her being very kind to animals helped the two of you get '
 'along?']

来源：https://stackoverflow.com/questions/56977820/better-way-to-use-spacy-to-parse-sentences

标签

python

nlp

spacy