Given an NLP parse tree like
(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP
First get parse tree:
# stanza.install_corenlp()
from stanza.server import CoreNLPClient
text = "Joe realized that the train was late while he waited at the train station"
with CoreNLPClient(
annotators=['tokenize', 'pos', 'lemma', 'parse', 'depparse'],
output_format="json",
timeout=30000,
memory='16G') as client:
output = client.annotate(text)
# print(output.sentence[0])
parse_tree = output['sentences'][0]['parse']
parse_tree = ' '.join(parse_tree.split())
Then use this gist to extract clauses by calling:
print_clauses(parse_str=parse_tree)
The output will be:
{'the train was late', 'he waited at the train station', 'Joe realized'}