NLTK Wordnet Synset for word phrase

别来无恙 提交于 2019-12-05 08:32:35

Apart from what I said in the comments already, I think the way you select the best hyperonym might be flawed. The synset you end up with is not the lowest common hyperonym of all words, but only that of two of them.

Let's stick with your example of "school & office supplies". For each word in the expression you get a number of synsets. So the variable node_synsets will look something like the following:

[[school_1, school_2], [office_1, office_2, office_3], [supply_1]]

In this example, there are 6 ways to combine each synset with any of the others:

[(school_1, office_1, supply_1),
 (school_1, office_2, supply_1),
 (school_1, office_3, supply_1),
 (school_2, office_1, supply_1),
 (school_2, office_2, supply_1),
 (school_2, office_3, supply_1)]

These triples are what you iterate over in the outer for loop (with itertools.product). If the expression has 4 words, you would iterate over quadruples, with 5 it's quintuples, etc.

Now, with the inner for loop, you pair off each triple. The first one is:

[(school_1, office_1),
 (school_1, supply_1),
 (office_1, supply_1)]

... and you determine the lowest hyperonym among each pair. So in the end you get the lowest hyperonym of, say, school_2 and office_1, which might be some kind of institution. This is probably not very meaningful, as it doesn't consider any synset of the last word.

Maybe you should try to find the lowest common hyperonym of all three words, in each combination of their synsets, and take the one scoring best among them.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!