How to split an NLP parse tree to clauses (independent and subordinate)?

后端 未结 2 1583
北海茫月
北海茫月 2020-12-29 14:35

Given an NLP parse tree like

(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP         


        
2条回答
  •  滥情空心
    2020-12-29 15:28

    First get parse tree:

    # stanza.install_corenlp()
    
    from stanza.server import CoreNLPClient
    
    text = "Joe realized that the train was late while he waited at the train station"
    
    with CoreNLPClient(
            annotators=['tokenize', 'pos', 'lemma', 'parse', 'depparse'],
            output_format="json",
            timeout=30000,
            memory='16G') as client:
        output = client.annotate(text)
        # print(output.sentence[0])
        parse_tree = output['sentences'][0]['parse']
        parse_tree = ' '.join(parse_tree.split())
    

    Then use this gist to extract clauses by calling:

    print_clauses(parse_str=parse_tree)
    

    The output will be:

    {'the train was late', 'he waited at the train station', 'Joe realized'}
    

提交回复
热议问题