问题
I am trying to make a tree (nested dictionary) from the output of dependency parser. The sentence is "I shot an elephant in my sleep". I am able to get the output as described on the link: How do I do dependency parsing in NLTK?
nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
To convert this list of tuples into nested dictionary, I used the following link: How to convert python list of tuples into tree?
def build_tree(list_of_tuples):
all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
root = {}
print all_nodes
for item in list_of_tuples:
rel, gov,dep = item
if gov is not 'ROOT':
all_nodes[gov][1][dep] = all_nodes[dep]
else:
root[dep] = all_nodes[dep]
return root
This gives the output as follows:
{'shot': (('ROOT', 'ROOT'),
{'I': (('nsubj', 'shot'), {}),
'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
'sleep': (('nmod', 'shot'),
{'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}
To find the root to leaf path, I used the following link: Return root to specific leaf from a nested dictionary tree
[Making the tree and finding the path are two separate things]The second objective is to find the root to leaf node path like done Return root to specific leaf from a nested dictionary tree.
But I want to get the root-to-leaf (dependency relationship path)
So, for instance, when I will call recurse_category(categories, 'an') where categories is the nested tree structure and 'an' is the word in the tree, I should get ROOT-nsubj-dobj
(dependency relationship till root) as output.
回答1:
This converts the output to the nested dictionary form. I will keep you updated if I can find the path as well. Maybe this, is helpful.
list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]
nodes={}
for i in list_of_tuples:
rel,parent,child=i
nodes[child]={'Name':child,'Relationship':rel}
forest=[]
for i in list_of_tuples:
rel,parent,child=i
node=nodes[child]
if parent=='ROOT':# this should be the Root Node
forest.append(node)
else:
parent=nodes[parent]
if not 'children' in parent:
parent['children']=[]
children=parent['children']
children.append(node)
print forest
The output is a nested dictionary,
[{'Name': 'shot', 'Relationship': 'ROOT',
'children':
[{'Name': 'I', 'Relationship': 'nsubj'},
{'Name': 'elephant', 'Relationship':
'dobj',
'children':
[{'Name': 'an',
'Relationship': 'det'}]},
{'Name': 'sleep', 'Relationship':
'nmod',
'children':
[{'Name': 'in',
'Relationship': 'case'},
{'Name': 'my', 'Relationship':
'nmod:poss'}]}]}]
The following function can help you to find the root-to-leaf path:
def recurse_category(categories,to_find):
for category in categories:
if category['Name'] == to_find:
return True, [category['Relationship']]
if 'children' in category:
found, path = recurse_category(category['children'], to_find)
if found:
return True, [category['Relationship']] + path
return False, []
回答2:
Firstly, if you're just using the pre-trained model for the Stanford CoreNLP dependency parser, you should use the CoreNLPDependencyParser
from nltk.parse.corenlp
and avoid using the old nltk.parse.stanford
interface.
See Stanford Parser and NLTK
After downloading and running the Java server in terminal, in Python:
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>
Now we see that the parses are of type DependencyGraph
from nltk.parse.dependencygraph
https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36
To convert the DependencyGraph
to a nltk.tree.Tree
object by simply doing DependencyGraph.tree()
:
>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])
>>> parses[0].tree().pretty_print()
shot
_________|____________
| | elephant banana
| | | _____|_____
I . an with a
To convert it into the bracketed parse format:
>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)
If you're looking for dependency triplets:
>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]
>>> for governor, dep, dependent in parses[0].triples():
... print(governor, dep, dependent)
...
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')
In CONLL format:
>>> print(parses[0].to_conll(style=10))
1 I I PRP PRP _ 2 nsubj _ _
2 shot shoot VBD VBD _ 0 ROOT _ _
3 an a DT DT _ 4 det _ _
4 elephant elephant NN NN _ 2 dobj _ _
5 with with IN IN _ 7 case _ _
6 a a DT DT _ 7 det _ _
7 banana banana NN NN _ 2 nmod _ _
8 . . . . _ 2 punct _ _
来源:https://stackoverflow.com/questions/52148690/how-to-make-a-tree-from-the-output-of-a-dependency-parser