How to get the WordNet synset given an offset ID?

前端 未结 4 2028
天命终不由人
天命终不由人 2020-12-13 10:51

I have a WordNet synset offset (for example id=\"n#05576222\"). Given this offset, how can I get the synset using Python?

相关标签:
4条回答
  • 2020-12-13 11:04

    For NTLK 3.2.3 or newer, please see donners45's answer.

    For older versions of NLTK:

    There is no built-in method in the NLTK but you could use this:

    from nltk.corpus import wordnet
    
    syns = list(wordnet.all_synsets())
    offsets_list = [(s.offset(), s) for s in syns]
    offsets_dict = dict(offsets_list)
    
    offsets_dict[14204095]
    >>> Synset('heatstroke.n.01')
    

    You can then pickle the dictionary and load it whenever you need it.

    For NLTK versions prior to 3.0, replace the line

    offsets_list = [(s.offset(), s) for s in syns]
    

    with

    offsets_list = [(s.offset, s) for s in syns]
    

    since prior to NLTK 3.0 offset was an attribute instead of a method.

    0 讨论(0)
  • 2020-12-13 11:18

    You can use of2ss(), For example:

    from nltk.corpus import wordnet as wn
    syn = wn.of2ss('01580050a')
    

    will return Synset('necessary.a.01')

    0 讨论(0)
  • 2020-12-13 11:28

    As of NLTK 3.2.3, there's a public method for doing this:

    wordnet.synset_from_pos_and_offset(pos, offset)
    

    In earlier versions you can use:

    wordnet._synset_from_pos_and_offset(pos, offset)
    

    This returns a synset based on it's POS and offest ID. I think this method is only available in NLTK 3.0 but I'm not sure.

    Example:

    from nltk.corpus import wordnet as wn
    wn.synset_from_pos_and_offset('n',4543158)
    >> Synset('wagon.n.01')
    
    0 讨论(0)
  • 2020-12-13 11:28

    Other than using NLTK, another option would be to use the .tab file from the Open Multilingual WordNet http://compling.hss.ntu.edu.sg/omw/ for the Princeton WordNet. Normally i used the recipe below to access wordnet as a dictionary with offset as the key and ; delimited strings as a values:

    # Gets first instance of matching key given a value and a dictionary.    
    def getKey(dic, value):
      return [k for k,v.split(";") in dic.items() if v in value]
    
    # Read Open Multi WN's .tab file
    def readWNfile(wnfile, option="ss"):
      reader = codecs.open(wnfile, "r", "utf8").readlines()
      wn = {}
      for l in reader:
        if l[0] == "#": continue
        if option=="ss":
          k = l.split("\t")[0] #ss as key
          v = l.split("\t")[2][:-1] #word
        else:
          v = l.split("\t")[0] #ss as value
          k = l.split("\t")[2][:-1] #word as key
        try:
          temp = wn[k]
          wn[k] = temp + ";" + v
        except KeyError:
          wn[k] = v  
      return wn
    
    princetonWN = readWNfile('wn-data-eng.tab')
    offset = "n#05576222"
    offset = offset.split('#')[1]+'-'+ offset.split('#')[0]
    
    print princetonWN.split(";")
    print getKey('heatstroke')
    
    0 讨论(0)
提交回复
热议问题