difference between similar() and concordance in nltk

I have read the text1.similar("monstrous") and text1.concordance("monstrous") from this.

Where I couldn't get the satisfactory answer for the difference between text1.concordance('monstrous') and text1.similar('monstrous') of natural language processing toolkit in python.

So would you please give the explanation with an example in detail?

Using concordance(token) gives you the context surrounding the argument token. It will show you the sentences where token appears.

Using similar(token) returns a list of words that appear in the same context as token. In this case the the context is just the words directly on either side of token.

So, looking at the Moby Dick text (text1). We can check the concordance of 'monstrous'

text1.concordance('monstrous')

# returns:
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

And then we can get a list of words that appear in similar contexts to 'monstrous'. The context for the first returned line is 'most _____ size'.

text1.similar('monstrous')

# returns:
determined maddens contemptible modifies abundant tyrannical puzzled
trustworthy impalpable gamesome curious mean pitiable untoward
christian subtly passing domineering uncommon true

If we take the word 'true' and check it's concordance with text.concordance('true') we will get back the first 25 of 87 uses of the word 'true'. This isn't terribly useful, but NLTK does provide an additional method called common_contexts that shows when the use of a list of words share the same surrounding words.

text1.common_contexts(['monstrous', 'true'])

# returns:
the_pictures

This result tells us that the phrases "the monstrous pictures" and "the true pictures" both appear in Moby Dick.

I will explain with example:

text1.similar("monstrous")

will output the words with similar context such as word1 ______ word2. For example it outputs the word doleful. If you run:

text1.concordance("monstrous")

You will see among the matches the line:

that has survived the flood ; most monstrous and most mountainous ! That Himmal

If you run:

text1.concordance("doleful")

You will see among the matches the line:

ite perspectives . There ' s a most doleful and most mocking funeral ! The sea

And

text1.common_contexts(["monstrous", "doleful"])

will output common surrounding words of monstrous and doleful which are "most" and "and"

most_and

blueeyes0710

Concordance(token) provides you with the context in which a token is used. Similar(token) provides you with other words that appear in similar contexts.

To illustrate, here a more general description to approximate their functionality.

1) Concordance(token): This returns a predefined number of words to the left and right of your token (let's call this collection of words "Z"). It does that for each instance that your token appears in the text.

2) similar(token): A word will be listed here if its occurrence within the words of the set "Z" is rather likely.

As per NLTK docs. A concordance view shows us every occurrence of a given word, together with some context. For example:

similar is used to find other words appear in a similar range of contexts. For example:

来源：https://stackoverflow.com/questions/43438008/difference-between-similar-and-concordance-in-nltk

标签

python

nltk

text-processing