How to get the dependency tree with spaCy?

柔情痞子 提交于 2019-11-26 18:59:11

问题


I have been trying to find how to get the dependency tree with spaCy but I can't find anything on how to get the tree, only on how to navigate the tree.


回答1:


In case someone wants to easily view the dependency tree produced by spacy, one solution would be to convert it to an nltk.tree.Tree and use the nltk.tree.Tree.pretty_print method. Here is an example:

import spacy
from nltk import Tree


en_nlp = spacy.load('en')

doc = en_nlp("The quick brown fox jumps over the lazy dog.")

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]

Output:

                jumps                  
  ________________|____________         
 |    |     |     |    |      over     
 |    |     |     |    |       |        
 |    |     |     |    |      dog      
 |    |     |     |    |    ___|____    
The quick brown  fox   .  the      lazy

Edit: For changing the token representation you can do this:

def tok_format(tok):
    return "_".join([tok.orth_, tok.tag_])


def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(tok_format(node), [to_nltk_tree(child) for child in node.children])
    else:
        return tok_format(node)

Which results in:

                         jumps_VBZ                           
   __________________________|___________________             
  |       |        |         |      |         over_IN        
  |       |        |         |      |            |            
  |       |        |         |      |          dog_NN        
  |       |        |         |      |     _______|_______     
The_DT quick_JJ brown_JJ   fox_NN  ._. the_DT         lazy_JJ



回答2:


The tree isn't an object in itself; you just navigate it via the relationships between tokens. That's why the docs talk about navigating the tree, but not 'getting' it.

First, let's parse some text to get a Doc object:

>>> import spacy
>>> nlp = spacy.load('en')
>>> doc = nlp('First, I wrote some sentences. Then spaCy parsed them. Hooray!')

doc is a Sequence of Token objects:

>>> doc[0]
First
>>> doc[1]
,
>>> doc[2]
I
>>> doc[3]
wrote

But it doesn't have a single root token. We parsed a text made up of three sentences, so there are three distinct trees, each with their own root. If we want to start our parsing from the root of each sentence, it will help to get the sentences as distinct objects, first. Fortunately, doc exposes these to us via the .sents property:

>>> sentences = list(doc.sents)
>>> for sentence in sentences:
...     print(sentence)
... 
First, I wrote some sentences.
Then spaCy parsed them.
Hooray!

Each of these sentences is a Span with a .root property pointing to its root token. Usually, the root token will be the main verb of the sentence (although this may not be true for unusual sentence structures, such as sentences without a verb):

>>> for sentence in sentences:
...     print(sentence.root)
... 
wrote
parsed
Hooray

With the root token found, we can navigate down the tree via the .children property of each token. For instance, let's find the subject and object of the verb in the first sentence. The .dep_ property of each child token describes its relationship with its parent; for instance a dep_ of 'nsubj' means that a token is the nominal subject of its parent.

>>> root_token = sentences[0].root
>>> for child in root_token.children:
...     if child.dep_ == 'nsubj':
...         subj = child
...     if child.dep_ == 'dobj':
...         obj = child
... 
>>> subj
I
>>> obj
sentences

We can likewise keep going down the tree by viewing one of these token's children:

>>> list(obj.children)
[some]

Thus with the properties above, you can navigate the entire tree. If you want to visualise some dependency trees for example sentences to help you understand the structure, I recommend playing with displaCy.




回答3:


It turns out, the tree is available through the tokens in a document.

Would you want to find the root of the tree, you can just go though the document:

def find_root(docu):
    for token in docu:
        if token.head is token:
            return token

To then navigate the tree, the tokens have API to get through the children




回答4:


You can use the library below to view your dependency tree, found it extremely helpful!

from spacy import displacy

nlp = spacy.load('en')
doc = nlp(u'This is a sentence.')
displacy.serve(doc, style='dep')



回答5:


I don't know if this is a new API call or what, but there's a .print_tree() method on the Document class that makes quick work of this.

https://spacy.io/api/doc#print_tree

It dumps the dependency tree to JSON. It deals with multiple sentence roots and all that :

    import spacy    
    nlp = spacy.load('en')
    doc1 = nlp(u'This is the way the world ends.  So you say.')  
    print(doc1.print_tree(light=True))

The name print_tree is a bit of a misnomer, the method itself doesn't print anything, rather it returns a list of dicts, one for each sentence.




回答6:


I also needed to do it so below full code:

import sys
def showTree(sent):
    def __showTree(token):
        sys.stdout.write("{")
        [__showTree(t) for t in token.lefts]
        sys.stdout.write("%s->%s(%s)" % (token,token.dep_,token.tag_))
        [__showTree(t) for t in token.rights]
        sys.stdout.write("}")
    return __showTree(sent.root)

And if you want spacing for the terminal:

def showTree(sent):
    def __showTree(token, level):
        tab = "\t" * level
        sys.stdout.write("\n%s{" % (tab))
        [__showTree(t, level+1) for t in token.lefts]
        sys.stdout.write("\n%s\t%s [%s] (%s)" % (tab,token,token.dep_,token.tag_))
        [__showTree(t, level+1) for t in token.rights]
        sys.stdout.write("\n%s}" % (tab))
    return __showTree(sent.root, 1)



回答7:


I do not have enough knowledge about the parsing yet. However, outcome of my literature study has resulted in knowing that spaCy has a shift-reduce dependency parsing algorithm. This parses the question/sentence, resulting in a parsing tree. To visualize this, you can use the DisplaCy, combination of CSS and Javascript, works with Python and Cython. Furthermore, you can parse using the SpaCy library, and import the Natural Language Toolkit (NLTK). Hope this helps



来源:https://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!