Python: Passing variables into Wordnet Synsets methods in NLTK

我们两清 提交于 2019-12-07 12:20:24

问题


I need to work on a project that require NLTK so I started learning Python two weeks ago but struggling to understand Python and NLTK.

From the NLTK documentation, I can understand the following codes and they work well if I manually add the word apple and pear into the codes below.

from nltk.corpus import wordnet as wn

apple = wn.synset('apple.n.01')
pear = wn.synset('pear.n.01')

print apple.lch_similarity(pear)

Output: 2.53897387106

However, I need to use the NLTK to work with a list of items. For example, I have a list of items below and I would like to compare the items from list1 with list2 - for example: compare word1 from list1 with every word in list 2, then word2 from list1 with every word from list2 until all words in list1 is compared.

list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]

wordFromList1 = list1[0]
wordFromList2 = list2[0]

wordFromList1 = wn.synset(wordFromList1)
wordFromList2 = wn.synset(wordFromList2)    

print wordFromList1.lch_similarity(wordFromList2)

The codes above will of course gives an error. Can anyone show me how I can pass a variable into synset method [wn.synset(*pass_variable_in_here*)] so that I can use a double loop to get the lch_similarity values for them. Thank you.


回答1:


wordnet.synset expects a 3-part name string of the form: word.pos.nn.

You did not specify the pos.nn part for each word in list1 and list2.

It seems reasonable to assume that all the words are nouns, so we could try appending the string '.n.01' to each string in list1 and list2:

for word1, word2 in IT.product(list1, list2):
    wordFromList1 = wordnet.synset(word1+'.n.01')
    wordFromList2 = wordnet.synset(word2+'.n.02')

That does not work, however. wordnet.synset('drinks.n.01') raises a WordNetError.

On the other hand, the same doc page shows you can lookup similar words using the synsets method:

For example, wordnet.synsets('drinks') returns the list:

[Synset('drink.n.01'),
 Synset('drink.n.02'),
 Synset('beverage.n.01'),
 Synset('drink.n.04'),
 Synset('swallow.n.02'),
 Synset('drink.v.01'),
 Synset('drink.v.02'),
 Synset('toast.v.02'),
 Synset('drink_in.v.01'),
 Synset('drink.v.05')]

So at this point, you need to give some thought to what you want the program to do. If you are okay with just picking the first item in this list as a proxy for drinks, then you could use

for word1, word2 in IT.product(list1, list2):
    wordFromList1 = wordnet.synsets(word1)[0]
    wordFromList2 = wordnet.synsets(word2)[0]

which would result in a program that looks like this:

import nltk.corpus as corpus
import itertools as IT

wordnet = corpus.wordnet
list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]

for word1, word2 in IT.product(list1, list2):
    # print(word1, word2)
    wordFromList1 = wordnet.synsets(word1)[0]
    wordFromList2 = wordnet.synsets(word2)[0]
    print('{w1}, {w2}: {s}'.format(
        w1 = wordFromList1.name,
        w2 = wordFromList2.name,
        s = wordFromList1.lch_similarity(wordFromList2)))

which yields

apple.n.01, pear.n.01: 2.53897387106
apple.n.01, shell.n.01: 1.07263680226
apple.n.01, movie.n.01: 1.15267950994
apple.n.01, fire.n.01: 1.07263680226
...


来源:https://stackoverflow.com/questions/14336562/python-passing-variables-into-wordnet-synsets-methods-in-nltk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!