How to get the wordnet sense frequency of a synset in NLTK?

前端 未结 2 1296
一个人的身影
一个人的身影 2020-12-31 23:23

According to the documentation i can load a sense tagged corpus in nltk as such:

>>> from nltk.corpus import wordnet_ic
>>> brown_ic = word         


        
相关标签:
2条回答
  • 2020-12-31 23:52

    I managed to do it this way.

    from nltk.corpus import wordnet as wn
    
    word = "dog"
    synsets = wn.synsets(word)
    
    sense2freq = {}
    for s in synsets:
      freq = 0  
      for lemma in s.lemmas:
        freq+=lemma.count()
      sense2freq[s.offset+"-"+s.pos] = freq
    
    for s in sense2freq:
      print s, sense2freq[s]
    
    0 讨论(0)
  • 2021-01-01 00:01

    If you only need to know what the most frequent word is, you can do wn.synsets(word)[0] since WordNet generally ranks them from most frequent to least frequent.

    (source: Daniel Jurafsky's Speech and Language Processing 2nd edition)

    0 讨论(0)
提交回复
热议问题