How to get the WordNet synset given an offset ID?

匿名 (未验证) 提交于 2019-12-03 01:39:01

问题:

I have a WordNet synset offset (for example id="n#05576222"). Given this offset, how can I get the synset using Python?

回答1:

As of NLTK 3.2.3, there's a public method for doing this:

wordnet.synset_from_pos_and_offset(pos, offset)

In earlier versions you can use:

wordnet._synset_from_pos_and_offset(pos, offset)

This returns a synset based on it's POS and offest ID. I think this method is only available in NLTK 3.0 but I'm not sure.

Example:

from nltk.corpus import wordnet as wn wn._synset_from_pos_and_offset('n',4543158) >> Synset('wagon.n.01')


回答2:

For NTLK 3.2.3 or newer, please see donners45's answer.

For older versions of NLTK:

There is no built-in method in the NLTK but you could use this:

from nltk.corpus import wordnet  syns = list(wordnet.all_synsets()) offsets_list = [(s.offset(), s) for s in syns] offsets_dict = dict(offsets_list)  offsets_dict[14204095] >>> Synset('heatstroke.n.01')

You can then pickle the dictionary and load it whenever you need it.

For NLTK versions prior to 3.0, replace the line

offsets_list = [(s.offset(), s) for s in syns]

with

offsets_list = [(s.offset, s) for s in syns]

since prior to NLTK 3.0 offset was an attribute instead of a method.



回答3:

Other than using NLTK, another option would be to use the .tab file from the Open Multilingual WordNet http://compling.hss.ntu.edu.sg/omw/ for the Princeton WordNet. Normally i used the recipe below to access wordnet as a dictionary with offset as the key and ; delimited strings as a values:

# Gets first instance of matching key given a value and a dictionary.     def getKey(dic, value):   return [k for k,v.split(";") in dic.items() if v in value]  # Read Open Multi WN's .tab file def readWNfile(wnfile, option="ss"):   reader = codecs.open(wnfile, "r", "utf8").readlines()   wn = {}   for l in reader:     if l[0] == "#": continue     if option=="ss":       k = l.split("\t")[0] #ss as key       v = l.split("\t")[2][:-1] #word     else:       v = l.split("\t")[0] #ss as value       k = l.split("\t")[2][:-1] #word as key     try:       temp = wn[k]       wn[k] = temp + ";" + v     except KeyError:       wn[k] = v     return wn  princetonWN = readWNfile('wn-data-eng.tab') offset = "n#05576222" offset = offset.split('#')[1]+'-'+ offset.split('#')[0]  print princetonWN.split(";") print getKey('heatstroke')


回答4:

You can use of2ss(), For example:

from nltk.corpus import wordnet as wn syn = wn.of2ss('01580050a')

will return Synset('necessary.a.01')



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!