Extract parent and child node from python tree

问题

I am using nltk's Tree data structure.Below is the sample nltk.Tree.

(S
  (S
    (ADVP (RB recently))
    (NP (NN someone))
    (VP
      (VBD mentioned)
      (NP (DT the) (NN word) (NN malaria))
      (PP (TO to) (NP (PRP me)))))
  (, ,)
  (CC and)
  (IN so)
  (S
    (NP
      (NP (CD one) (JJ whole) (NN flood))
      (PP (IN of) (NP (NNS memories))))
    (VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back))))))
  (. .))

I am not aware of nltk.Tree datastructure. I want to extract the parent and the super parent node for every leaf node e.g. for 'recently' I want (ADVP, RB), and for 'someone' it is (NP, NN)This is the final outcome i want.Earlier answer used eval() function to do so which i want to avoid.

[('ADVP', 'RB'), ('NP', 'NN'), ('VP', 'VBD'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'), ('PP', 'TO'), ('NP', 'PRP'), ('S', 'CC'), ('S', 'IN'), ('NP', 'CD'), ('NP', 'JJ'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NNS'), ('VP', 'VBD'), ('VP', 'VBG'), ('ADVP', 'RB')]

回答1:

Python code for the same without using eval function and using nltk tree datastructure

sentences = " (S
  (S
(ADVP (RB recently))
(NP (NN someone))
(VP
  (VBD mentioned)
  (NP (DT the) (NN word) (NN malaria))
  (PP (TO to) (NP (PRP me)))))
  (, ,)
  (CC and)
  (IN so)
  (S
    (NP
      (NP (CD one) (JJ whole) (NN flood))
      (PP (IN of) (NP (NNS memories))))
    (VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back))))))
  (. .))"

print list(tails(sentences))


def tails(items, path=()):
for child in items:
    if type(child) is nltk.Tree:
        if child.label() in {".", ","}:  # ignore punctuation
            continue
        for result in tails(child, path + (child.label(),)):
            yield result
    else:
        yield path[-2:]

来源：https://stackoverflow.com/questions/29397460/extract-parent-and-child-node-from-python-tree

标签

python

tree

nltk

stanford-nlp