Get corresponding verbs and nouns for adverbs and adjectives

问题

How can I get corresponding verbs and nouns for adverbs and adjectives in python? It seems simple succession and precedence may not be very accurate. There may be stopwords like to eg. in I am delighted to learn...

I can't any library or even problem statement formalised as such.

Code right now. Now I want to return the corresponding the verb for adverb and noun for each adjective in the sentence. Please help.

Code:
def pos_func(input_text):
    #pos tagging code:
    text=input_text
    tokens=tokenize_words(text)
    tagged=pos_tag(tokens)
    pos_store(tagged)

def pos_store(tagged):
    verbs=[]
    adjectives=[]
    adverbs=[]
    nouns=[]
    for tag in tagged:
        pos=tag[1]
        if pos[0]=='V':
            verbs.append(tag[0])
        elif pos[0]=='N':
            nouns.append(tag[0])
        elif pos[0]=='J':
            adjectives.append(tag[0])
        elif pos[0:2]=='RB':
            adverbs.append(tag[0])


def tokenize_words(text):
    tokens = TreebankWordTokenizer().tokenize(text)
    contractions = ["n't", "'ll", "'m"]
    fix = []
    for i in range(len(tokens)):
        for c in contractions:
            if tokens[i] == c: fix.append(i)
    fix_offset = 0
    for fix_id in fix:
        idx = fix_id - 1 - fix_offset
        tokens[idx] = tokens[idx] + tokens[idx+1]
        del tokens[idx+1]
        fix_offset += 1
    return tokens

回答1:

The general problem you are trying to solve is called Dependency Parsing. To extract such relations between words you need more then just the linear sequence of words that a simple POS tagging analysis offers. Consider the following sentence:

"He bought a beautiful and fast car." You would extract (beautiful, car) and (fast, car). You face a greater problem than just filtering stop words between a Noun and an Adverb. Using a parse tree analysis will give you a better idea of why this is not something you can solve using the word sequence.

This is the parse tree for our sentence:

(ROOT
  (S
    (NP (PRP He))
    (VP (VBD bought)
      (NP (DT a)
        (ADJP (JJ beautiful)
          (CC and)
          (JJ fast))
        (NN car)))
    (. .)))

As you can see "a beautiful and fast car" is a NounPhrase (NP) containing a Determiner(DT), and AdjectivalPhrase(ADJP, "beautiful and fast") and Noun(NN, "car"). One approach that was used for some time was to create a rule based system that extracted the pairs from this parse tree. Fortunately, something even better has been developed that addresses your problem directly.

The dependency pairs are:

nsubj(bought-2, He-1)
root(ROOT-0, bought-2)
det(car-7, a-3)
amod(car-7, beautiful-4)
cc(beautiful-4, and-5)
conj:and(beautiful-4, fast-6)
amod(car-7, fast-6)
dobj(bought-2, car-7)

As you can see this is exactly what you need. These are typed dependencies, so you'll also need to filter the ones you are interested in(amod, advmod in your case)

You can find the full list of dependency types here: http://nlp.stanford.edu/software/dependencies_manual.pdf Stanford Parser Demo here: http://nlp.stanford.edu:8080/parser/ Stanford Core NLP Demo(for the cool visualisations) here: http://nlp.stanford.edu:8080/corenlp/

You can read a great article about creating a dependency parser in Python here (you will need training data though): https://honnibal.wordpress.com/2013/12/18/a-simple-fast-algorithm-for-natural-language-dependency-parsing/

Python interface to CoreNLP: https://github.com/dasmith/stanford-corenlp-python

You can also try writing your own dependency grammar, NLTK offers an API for that (look for chapter "5 Dependencies and Dependency Grammar"): http://www.nltk.org/book/ch08.html

回答2:

Using the SpaCy library, and the sample sentence in bogs' answer, I get something close to Stanford.

>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>> doc = nlp("He bought a beautiful and fast car.")
# to match the output style of the Stanford library for comparison...
>>> for token in doc:
        print(f"{token.dep_}({token.head.text}-{token.head.i+1}, {token.text}-{token.i+1})")

nsubj(bought-2, He-1)
ROOT(bought-2, bought-2)
det(car-7, a-3)
amod(car-7, beautiful-4)
cc(beautiful-4, and-5)
conj(beautiful-4, fast-6)
dobj(bought-2, car-7)
punct(bought-2, .-8)

Interestingly, it misses the direct amod connection with the car being fast.

displacy.render(doc, style="dep", jupyter=True, options={'distance': 100})

来源：https://stackoverflow.com/questions/32329039/get-corresponding-verbs-and-nouns-for-adverbs-and-adjectives

标签

python

nlp

nltk

stanford-nlp