difference between similar() and concordance in nltk

♀尐吖头ヾ 提交于 2019-11-30 10:12:10

Using concordance(token) gives you the context surrounding the argument token. It will show you the sentences where token appears.

Using similar(token) returns a list of words that appear in the same context as token. In this case the the context is just the words directly on either side of token.

So, looking at the Moby Dick text (text1). We can check the concordance of 'monstrous'

text1.concordance('monstrous')

# returns:
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

And then we can get a list of words that appear in similar contexts to 'monstrous'. The context for the first returned line is 'most _____ size'.

text1.similar('monstrous')

# returns:
determined maddens contemptible modifies abundant tyrannical puzzled
trustworthy impalpable gamesome curious mean pitiable untoward
christian subtly passing domineering uncommon true

If we take the word 'true' and check it's concordance with text.concordance('true') we will get back the first 25 of 87 uses of the word 'true'. This isn't terribly useful, but NLTK does provide an additional method called common_contexts that shows when the use of a list of words share the same surrounding words.

text1.common_contexts(['monstrous', 'true'])

# returns:
the_pictures

This result tells us that the phrases "the monstrous pictures" and "the true pictures" both appear in Moby Dick.

I will explain with example:

text1.similar("monstrous")

will output the words with similar context such as word1 ______ word2. For example it outputs the word doleful. If you run:

text1.concordance("monstrous")

You will see among the matches the line:

that has survived the flood ; most monstrous and most mountainous ! That Himmal

If you run:

text1.concordance("doleful")

You will see among the matches the line:

ite perspectives . There ' s a most doleful and most mocking funeral ! The sea

And

text1.common_contexts(["monstrous", "doleful"])

will output common surrounding words of monstrous and doleful which are "most" and "and"

most_and

blueeyes0710

Concordance(token) provides you with the context in which a token is used. Similar(token) provides you with other words that appear in similar contexts.

To illustrate, here a more general description to approximate their functionality.

1) Concordance(token): This returns a predefined number of words to the left and right of your token (let's call this collection of words "Z"). It does that for each instance that your token appears in the text.

2) similar(token): A word will be listed here if its occurrence within the words of the set "Z" is rather likely.

As per NLTK docs. A concordance view shows us every occurrence of a given word, together with some context. For example:

similar is used to find other words appear in a similar range of contexts. For example:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!