What is the best stemming method in Python?

后端 未结 6 1591
被撕碎了的回忆
被撕碎了的回忆 2020-12-12 22:36

I tried all the nltk methods for stemming but it gives me weird results with some words.

Examples

It often cut end of words when it shouldn\'t do it :

6条回答
  •  余生分开走
    2020-12-12 23:05

    All these stemmers that have been discussed here are algorithmic stemmer,hence they can always produce unexpected results such as

    In [3]: from nltk.stem.porter import *
    
    In [4]: stemmer = PorterStemmer()
    
    In [5]: stemmer.stem('identified')
    Out[5]: u'identifi'
    
    In [6]: stemmer.stem('nonsensical')
    Out[6]: u'nonsens'
    

    To correctly get the root words one need a dictionary based stemmer such as Hunspell Stemmer.Here is a python implementation of it in the following link. Example code is here

    >>> import hunspell
    >>> hobj = hunspell.HunSpell('/usr/share/myspell/en_US.dic', '/usr/share/myspell/en_US.aff')
    >>> hobj.spell('spookie')
    False
    >>> hobj.suggest('spookie')
    ['spookier', 'spookiness', 'spooky', 'spook', 'spoonbill']
    >>> hobj.spell('spooky')
    True
    >>> hobj.analyze('linked')
    [' st:link fl:D']
    >>> hobj.stem('linked')
    ['link']
    

提交回复
热议问题