How to identify the subject of a sentence?

前端 未结 6 1591
我在风中等你
我在风中等你 2020-12-14 18:53

Can Python + NLTK be used to identify the subject of a sentence? From what I have learned till now is that a sentence can be broken into a head and its dependents. For e.g.

相关标签:
6条回答
  • 2020-12-14 19:27

    rake_nltk (pip install rake_nltk) is a python library that wraps nltk and apparently uses the RAKE algorithm.

    from rake_nltk import Rake
    
    rake = Rake()
    
    kw = rake.extract_keywords_from_text("Can Python + NLTK be used to identify the subject of a sentence?")
    
    ranked_phrases = rake.get_ranked_phrases()
    
    print(ranked_phrases)
    
    # outputs the keywords ordered by rank
    >>> ['used', 'subject', 'sentence', 'python', 'nltk', 'identify']
    
    

    By default the stopword list from nltk is used. You can provide your custom stopword list and punctuation chars by passing them in the constructor:

    rake = Rake(stopwords='mystopwords.txt', punctuations=''',;:!@#$%^*/\''')
    

    By default string.punctuation is used for punctuation.

    The constructor also accepts a language keyword which can be any language supported by nltk.

    0 讨论(0)
  • 2020-12-14 19:28

    As NLTK book (exercise 29) says, "One common way of defining the subject of a sentence S in English is as the noun phrase that is the child of S and the sibling of VP."

    Look at tree example: indeed, "I" is the noun phrase that is the child of S that is the sibling of VP, while "elephant" is not.

    0 讨论(0)
  • 2020-12-14 19:31

    You can paper over the issue by doing something like doc = nlp(text.decode('utf8')), but this will likely bring you more bugs in future.

    Credits: https://github.com/explosion/spaCy/issues/380

    0 讨论(0)
  • 2020-12-14 19:35

    You can use Spacy.

    Code

    import spacy
    nlp = spacy.load('en')
    sent = "I shot an elephant"
    doc=nlp(sent)
    
    sub_toks = [tok for tok in doc if (tok.dep_ == "nsubj") ]
    
    print(sub_toks) 
    
    0 讨论(0)
  • 2020-12-14 19:51

    Stanford Corenlp Tool can also be used to extract Subject-Relation-Object information of a sentence.

    Attaching screenshot of same:

    0 讨论(0)
  • 2020-12-14 19:52

    English language has two voices: Active voice and passive voice. Lets take most used voice: Active voice.

    It follows subject-verb-object model. To mark the subject, write a rule set with POS tags. Tag the sentence I[NOUN] shot[VERB] an elephant[NOUN]. If you see the first noun is subject, then there is a verb and then there is an object.

    If you want to make it more complicated, a sentence- I shot an elephant with a gun. Here the prepositions or subordinate conjunctions like with, at, in can be given roles. Here the sentence will be tagged as I[NOUN] shot[VERB] an elephant[NOUN] with[IN] a gun[NOUN]. You can easily say that word with gets instrumentative role. You can build a rule based system to get role of every word in the sentence.

    Also look at the patterns in passive voice and write rules for the same.

    0 讨论(0)
提交回复
热议问题