stemming

Slovenian stemmer for Sphinx

て烟熏妆下的殇ゞ 提交于 2019-11-30 16:00:43
问题 I am searching stemming algorithm for Slovenian language that I can use with Sphinx search. What I'm trying to achieve is for example when searching for 'jabolka', I also want results for documents containing 'jabolko', 'jabolki', 'jabolk', etc. I found some references about existence of Slovenian stemmer, but I can't find where to download it, it's not even for sale anywhere... Another option I've came across is using option wordforms in Sphinx source config (http://sphinxsearch.com/docs

Difference between Lucene stemmers: EnglishStemmer, PorterStemmer, LovinsStemmer

佐手、 提交于 2019-11-30 15:38:56
Have anybody compared these stemmers from Lucene (package org.tartarus.snowball.ext): EnglishStemmer, PorterStemmer, LovinsStemmer? What are the strong/weak points of algorithms behind them? When each of them should be used? Or maybe there are some more algorithms available for english words stemming? Thanks. The Lovins stemmer is a very old algorithm that is not of much practical use, since the Porter stemmer is much stronger. Based on some quick skimming of the source code, it seems PorterStemmer implements Porter's original (1980) algorithm , while EnglishStemmer implements his updated

is there is any stemmer available for indian language [closed]

家住魔仙堡 提交于 2019-11-30 15:03:16
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . is there is any implementation of stemmers for indian languages like(hindi,telugu) are available .... 回答1: Hindi Analyzer, with stemmer, is available in Lucene. It is based on this algorithm(pdf). 回答2: hindi_stemmer is a Python implementation of the Hindi stemmer described in "A Lightweight Stemmer for Hindi" by

is there is any stemmer available for indian language [closed]

走远了吗. 提交于 2019-11-30 13:47:50
is there is any implementation of stemmers for indian languages like(hindi,telugu) are available .... Hindi Analyzer , with stemmer, is available in Lucene. It is based on this algorithm (pdf). hindi_stemmer is a Python implementation of the Hindi stemmer described in " A Lightweight Stemmer for Hindi " by Ananthakrishnan Ramanathan and Durgesh D Rao. import java.util.Map; import java.util.WeakHashMap; /** * Hindi light stemmer- removes number, gender and case suffixes from nouns and adjectives public class HindiStemmerLight{ /** * A cache of words and their stems */ static private Map<String,

Python ISRIStemmer for Arabic text

∥☆過路亽.° 提交于 2019-11-30 10:39:41
I am running the following code on IDLE(Python) and I want to enter Arabic string and get the stemming for it but actually it doesn't work ">>> from nltk.stem.isri import ISRIStemmer ">>> st = ISRIStemmer() ">>> w= 'حركات' ">>> join = w.decode('Windows-1256') ">>> print st.stem(join).encode('Windows-1256').decode('utf-8') The result of running it is the same text in w which is 'حركات' which is not the stem but when do the following: ">>> print st.stem(u'اعلاميون') the result succeeded and returns the stem which is 'علم' why passing variable to stem() function doesn't return the stem. Ok, I

Stemming - code examples or open source projects?

房东的猫 提交于 2019-11-30 04:04:21
Stemming is something that's needed in tagging systems. I use delicious, and I don't have time to manage and prune my tags. I'm a bit more careful with my blog, but it isn't perfect. I write software for embedded systems that would be much more functional (helpful to the user) if they included stemming. For instance: Parse Parser Parsing Should all mean the same thing to whatever system I'm putting them into. Ideally there's a BSD licensed stemmer somewhere, but if not, where do I look to learn the common algorithms and techniques for this? Aside from BSD stemmers, what other open source

Converting plural to singular in a text file with Python

两盒软妹~` 提交于 2019-11-30 03:25:26
问题 I have txt files that look like this: word, 23 Words, 2 test, 1 tests, 4 And I want them to look like this: word, 23 word, 2 test, 1 test, 4 I want to be able to take a txt file in Python and convert plural words to singular. Here's my code: import nltk f = raw_input("Please enter a filename: ") def openfile(f): with open(f,'r') as a: a = a.read() a = a.lower() return a def stem(a): p = nltk.PorterStemmer() [p.stem(word) for word in a] return a def returnfile(f, a): with open(f,'w') as d: d =

stemming library in java [closed]

情到浓时终转凉″ 提交于 2019-11-30 02:40:23
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . Is there any library for stemming in java!? 回答1: There is an implementation of Porter's stemmer available on his website. The code is

Need a python module for stemming of text documents

那年仲夏 提交于 2019-11-30 00:47:33
问题 I need a good python module for stemming text documents in the pre-processing stage. I found this one http://pypi.python.org/pypi/PyStemmer/1.0.1 but i cannot find the documentation int the link provided. I anyone knows where to find the documentation or any other good stemming algorithm please help. 回答1: You may want to try NLTK >>> from nltk import PorterStemmer >>> PorterStemmer().stem('complications') 回答2: Python stemming module has implementations of various stemming algorithms like

Difference between Lucene stemmers: EnglishStemmer, PorterStemmer, LovinsStemmer

£可爱£侵袭症+ 提交于 2019-11-29 23:04:10
问题 Have anybody compared these stemmers from Lucene (package org.tartarus.snowball.ext): EnglishStemmer, PorterStemmer, LovinsStemmer? What are the strong/weak points of algorithms behind them? When each of them should be used? Or maybe there are some more algorithms available for english words stemming? Thanks. 回答1: The Lovins stemmer is a very old algorithm that is not of much practical use, since the Porter stemmer is much stronger. Based on some quick skimming of the source code, it seems