Is wordnet path similarity commutative?

匿名 (未验证) 提交于 2019-12-03 02:00:02

问题:

I am using the wordnet API from nltk. When I compare one synset with another I got None but when I compare them the other way around I get a float value.

Shouldn't they give the same value? Is there an explanation or is this a bug of wordnet?

Example:

wn.synset('car.n.01').path_similarity(wn.synset('automobile.v.01')) # None wn.synset('automobile.v.01').path_similarity(wn.synset('car.n.01')) # 0.06666666666666667 

回答1:

Technically without the dummy root, both car and automobile synsets would have no link to each other:

>>> from nltk.corpus import wordnet as wn >>> x = wn.synset('car.n.01') >>> y = wn.synset('automobile.v.01') >>> print x.shortest_path_distance(y) None >>> print y.shortest_path_distance(x) None 

Now, let's look at the dummy root issue closely. Firstly, there is a neat function in NLTK that says whether a synset needs a dummy root:

>>> x._needs_root() False >>> y._needs_root() True 

Next, when you look at the path_similarity code (http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity), you can see:

def path_similarity(self, other, verbose=False, simulate_root=True):   distance = self.shortest_path_distance(other, \                simulate_root=simulate_root and self._needs_root())    if distance is None or distance 

So for automobile synset, this parameter simulate_root=simulate_root and self._needs_root() will always be True when you try y.path_similarity(x) and when you try x.path_similarity(y) it will always be False since x._needs_root() is False:

>>> True and y._needs_root() True >>> True and x._needs_root() False 

Now when path_similarity() pass down to shortest_path_distance() (https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance) and then to hypernym_distances(), it will try to call for a list of hypernyms to check their distances, without simulate_root = True, the automobile synset will not connect to the car and vice versa:

>>> y.hypernym_distances(simulate_root=True) set([(Synset('automobile.v.01'), 0), (Synset('*ROOT*'), 2), (Synset('travel.v.01'), 1)]) >>> y.hypernym_distances() set([(Synset('automobile.v.01'), 0), (Synset('travel.v.01'), 1)]) >>> x.hypernym_distances() set([(Synset('object.n.01'), 8), (Synset('self-propelled_vehicle.n.01'), 2), (Synset('whole.n.02'), 8), (Synset('artifact.n.01'), 7), (Synset('physical_entity.n.01'), 10), (Synset('entity.n.01'), 11), (Synset('object.n.01'), 9), (Synset('instrumentality.n.03'), 5), (Synset('motor_vehicle.n.01'), 1), (Synset('vehicle.n.01'), 4), (Synset('entity.n.01'), 10), (Synset('physical_entity.n.01'), 9), (Synset('whole.n.02'), 7), (Synset('conveyance.n.03'), 5), (Synset('wheeled_vehicle.n.01'), 3), (Synset('artifact.n.01'), 6), (Synset('car.n.01'), 0), (Synset('container.n.01'), 4), (Synset('instrumentality.n.03'), 6)]) 

So theoretically, the right path_similarity is 0 / None , but because of the simulate_root=simulate_root and self._needs_root() parameter,

nltk.corpus.wordnet.path_similarity() in NLTK's API is not commutative.

BUT the code is also not wrong/bugged, since comparison of any synset distance by going through the root will be constantly far since the position of the dummy *ROOT* will never change, so the best of practice is to do this to calculate path_similarity:

>>> from nltk.corpus import wordnet as wn >>> x = wn.synset('car.n.01') >>> y = wn.synset('automobile.v.01')  # When you NEVER want a non-zero value, since going to  # the *ROOT* will always get you some sort of distance  # from synset x to synset y >>> max(wn.path_similarity(x,y), wn.path_similarity(y,x))  # when you can allow None in synset similarity comparison >>> min(wn.path_similarity(x,y), wn.path_similarity(y,x)) 


回答2:

I don't think it is a bug in wordnet per se. In your case, automobile is specified as a verb and car as noun, so you will need to look through the synset to see what the graph looks like and decide if the nets are labeled correctly.

A = 'car.n.01' B = 'automobile.v.01' C = 'automobile.n.01'   wn.synset(A).path_similarity(wn.synset(B))  wn.synset(B).path_similarity(wn.synset(A))    wn.synset(A).path_similarity(wn.synset(C)) # is 1 wn.synset(C).path_similarity(wn.synset(A)) # is also 1 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!