How to identify the subject of a sentence?

前端未结

关注

 6  1626

Can Python + NLTK be used to identify the subject of a sentence? From what I have learned till now is that a sentence can be broken into a head and its dependents. For e.g.

相关标签:

6条回答

自闭症患者

2020-12-14 19:27
rake_nltk (pip install rake_nltk) is a python library that wraps nltk and apparently uses the RAKE algorithm.
```
from rake_nltk import Rake

rake = Rake()

kw = rake.extract_keywords_from_text("Can Python + NLTK be used to identify the subject of a sentence?")

ranked_phrases = rake.get_ranked_phrases()

print(ranked_phrases)

# outputs the keywords ordered by rank
>>> ['used', 'subject', 'sentence', 'python', 'nltk', 'identify']
```
By default the stopword list from nltk is used. You can provide your custom stopword list and punctuation chars by passing them in the constructor:
```
rake = Rake(stopwords='mystopwords.txt', punctuations=''',;:!@#$%^*/\''')
```
By default string.punctuation is used for punctuation.

The constructor also accepts a language keyword which can be any language supported by nltk.
0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2020-12-14 19:28

As NLTK book (exercise 29) says, "One common way of defining the subject of a sentence S in English is as the noun phrase that is the child of S and the sibling of VP."

Look at tree example: indeed, "I" is the noun phrase that is the child of S that is the sibling of VP, while "elephant" is not.

0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-12-14 19:31

You can paper over the issue by doing something like doc = nlp(text.decode('utf8')), but this will likely bring you more bugs in future.

Credits: https://github.com/explosion/spaCy/issues/380

0 讨论(0)
发布评论:

提交评论
- 加载中...

死守一世寂寞

2020-12-14 19:35

You can use Spacy.

Code

import spacy
nlp = spacy.load('en')
sent = "I shot an elephant"
doc=nlp(sent)

sub_toks = [tok for tok in doc if (tok.dep_ == "nsubj") ]

print(sub_toks)

0 讨论(0)

忘了有多久

2020-12-14 19:51

Stanford Corenlp Tool can also be used to extract Subject-Relation-Object information of a sentence.

Attaching screenshot of same:

0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2020-12-14 19:52

English language has two voices: Active voice and passive voice. Lets take most used voice: Active voice.

It follows subject-verb-object model. To mark the subject, write a rule set with POS tags. Tag the sentence I[NOUN] shot[VERB] an elephant[NOUN]. If you see the first noun is subject, then there is a verb and then there is an object.

If you want to make it more complicated, a sentence- I shot an elephant with a gun. Here the prepositions or subordinate conjunctions like with, at, in can be given roles. Here the sentence will be tagged as I[NOUN] shot[VERB] an elephant[NOUN] with[IN] a gun[NOUN]. You can easily say that word with gets instrumentative role. You can build a rule based system to get role of every word in the sentence.

Also look at the patterns in passive voice and write rules for the same.

0 讨论(0)
发布评论:

提交评论
- 加载中...