Coreference resolution in python nltk using Stanford coreNLP

后端 未结 4 1956
情话喂你
情话喂你 2020-12-16 00:07

Stanford CoreNLP provides coreference resolution as mentioned here, also this thread, this, provides some insights about its implementation in Java.

However, I am

4条回答
  •  攒了一身酷
    2020-12-16 00:22

    Stanford's CoreNLP has now an official Python binding called StanfordNLP, as you can read in the StanfordNLP website.

    The native API doesn't seem to support the coref processor yet, but you can use the CoreNLPClient interface to call the "standard" CoreNLP (the original Java software) from Python.

    So, after following the instructions to setup the Python wrapper here, you can get the coreference chain like that:

    from stanfordnlp.server import CoreNLPClient
    
    text = 'Barack was born in Hawaii. His wife Michelle was born in Milan. He says that she is very smart.'
    print(f"Input text: {text}")
    
    # set up the client
    client = CoreNLPClient(properties={'annotators': 'coref', 'coref.algorithm' : 'statistical'}, timeout=60000, memory='16G')
    
    # submit the request to the server
    ann = client.annotate(text)    
    
    mychains = list()
    chains = ann.corefChain
    for chain in chains:
        mychain = list()
        # Loop through every mention of this chain
        for mention in chain.mention:
            # Get the sentence in which this mention is located, and get the words which are part of this mention
            # (we can have more than one word, for example, a mention can be a pronoun like "he", but also a compound noun like "His wife Michelle")
            words_list = ann.sentence[mention.sentenceIndex].token[mention.beginIndex:mention.endIndex]
            #build a string out of the words of this mention
            ment_word = ' '.join([x.word for x in words_list])
            mychain.append(ment_word)
        mychains.append(mychain)
    
    for chain in mychains:
        print(' <-> '.join(chain))
    

提交回复
热议问题