I have a WordNet synset offset (for example id="n#05576222"
). Given this offset, how can I get the synset using Python?
问题:
回答1:
As of NLTK 3.2.3, there's a public method for doing this:
wordnet.synset_from_pos_and_offset(pos, offset)
In earlier versions you can use:
wordnet._synset_from_pos_and_offset(pos, offset)
This returns a synset based on it's POS and offest ID. I think this method is only available in NLTK 3.0 but I'm not sure.
Example:
from nltk.corpus import wordnet as wn wn._synset_from_pos_and_offset('n',4543158) >> Synset('wagon.n.01')
回答2:
For NTLK 3.2.3 or newer, please see donners45's answer.
For older versions of NLTK:
There is no built-in method in the NLTK but you could use this:
from nltk.corpus import wordnet syns = list(wordnet.all_synsets()) offsets_list = [(s.offset(), s) for s in syns] offsets_dict = dict(offsets_list) offsets_dict[14204095] >>> Synset('heatstroke.n.01')
You can then pickle the dictionary and load it whenever you need it.
For NLTK versions prior to 3.0, replace the line
offsets_list = [(s.offset(), s) for s in syns]
with
offsets_list = [(s.offset, s) for s in syns]
since prior to NLTK 3.0 offset
was an attribute instead of a method.
回答3:
Other than using NLTK, another option would be to use the .tab file from the Open Multilingual WordNet
http://compling.hss.ntu.edu.sg/omw/ for the Princeton WordNet. Normally i used the recipe below to access wordnet as a dictionary with offset as the key and ;
delimited strings as a values:
# Gets first instance of matching key given a value and a dictionary. def getKey(dic, value): return [k for k,v.split(";") in dic.items() if v in value] # Read Open Multi WN's .tab file def readWNfile(wnfile, option="ss"): reader = codecs.open(wnfile, "r", "utf8").readlines() wn = {} for l in reader: if l[0] == "#": continue if option=="ss": k = l.split("\t")[0] #ss as key v = l.split("\t")[2][:-1] #word else: v = l.split("\t")[0] #ss as value k = l.split("\t")[2][:-1] #word as key try: temp = wn[k] wn[k] = temp + ";" + v except KeyError: wn[k] = v return wn princetonWN = readWNfile('wn-data-eng.tab') offset = "n#05576222" offset = offset.split('#')[1]+'-'+ offset.split('#')[0] print princetonWN.split(";") print getKey('heatstroke')
回答4:
You can use of2ss()
, For example:
from nltk.corpus import wordnet as wn syn = wn.of2ss('01580050a')
will return Synset('necessary.a.01')