Python stemmer issue: wrong stem

两盒软妹~` 提交于 2019-12-04 06:56:05

问题


Hi i'm trying to stem words with a python stemmer, i tried Porter and Lancaster, but they have the same problem. They can't stem correclty words that end with "er" or "e".

for example, they stem

computer -->  comput

rotate   -->  rotat

this is a part of the code

line=line.lower()
line=re.sub(r'[^a-z0-9 ]',' ',line)
line=line.split()
line=[x for x in line if x not in stops]
line=[ porter.stem(word, 0, len(word)-1) for word in line]
# or 'line=[ st.stem(word) for word in line]'
return line

any idea to fix this problem?


回答1:


To quote the page on Wikipedia, In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, given the word "produced", its lemma (linguistics) is "produce", however the stem is "produc": this is because there are words such as production. So your code is likely giving you correct results. You seem to expect a lemma which is not what a stemmer produces (except when the lemma happens to equal the stem)



来源:https://stackoverflow.com/questions/25193212/python-stemmer-issue-wrong-stem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!