porter-stemmer

python nltk — stemming list of sentences/phrases

假如想象 提交于 2020-12-26 09:02:51
问题 I have bunch of sentences in a list and I wanted to use nltk library to stem it. I am able to stem one sentence at a time, however I am having issues stemming sentences from a list and joining them back together. Is there a step I am missing? Quite new to nltk library. Thanks! import nltk from nltk.stem import PorterStemmer ps = PorterStemmer() # Success: one sentences at a time data = 'the gamers playing games' words = word_tokenize(data) for w in words: print(ps.stem(w)) # Fails: data_list

stem function error: stem required one positional argument

假如想象 提交于 2020-02-04 01:55:47
问题 here stem function shows error saying that stem required one positional argument in loop as in question? from nltk.stem import PorterStemmer as ps text='my name is pythonly and looking for a pythonian group to be formed by me iteratively' words = word_tokenize(text) for word in words: print(ps.stem(word)) 回答1: You need to instantiate a PorterStemmer object from nltk.stem import PorterStemmer as ps from nltk.tokenize import word_tokenize stemmer = ps() text = 'my name is pythonly and looking

Snowball Stemming: defining Regions

眉间皱痕 提交于 2020-01-03 21:09:32
问题 I'm trying to understand the snoball stemming algorithmus. The algorithmus is using two regions R1 and R2 that are definied as follows: R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel. R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel. http://snowball.tartarus.org/texts/r1r2.html Examples are b e a u t i f u l |<-

StandardAnalyzer with stemming

半腔热情 提交于 2019-12-30 07:25:17
问题 Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would like not to consider numbers, how can I achieve that? Thanks 回答1: If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer . Otherwise, you could create a new Analyzer that extends the

StandardAnalyzer with stemming

◇◆丶佛笑我妖孽 提交于 2019-12-30 07:25:07
问题 Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would like not to consider numbers, how can I achieve that? Thanks 回答1: If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer . Otherwise, you could create a new Analyzer that extends the

Stemming English words with Lucene

自闭症网瘾萝莉.ら 提交于 2019-12-28 03:30:08
问题 I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit". The function looks like: String stemTerm(String term){ ... } I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer

Is it possible to get a natural word after it has been stemmed?

房东的猫 提交于 2019-12-24 04:41:29
问题 I have a word play which after stemming has become plai . Now I want to get play again. Is it possible? I have used Porter's Stemmer. 回答1: Stemmer is able to process artificial non-existing words. Would you like them to be returned as elements of a set of all possible words? How do you know that the word doesn't exist and shouldn't be returned? As an option: find a dictionary of all words and their forms. Find a stem for every of them. Save this projection as a map: ( stem, list of all word

Porter Stemming of fried

送分小仙女□ 提交于 2019-12-24 02:23:03
问题 Why does the porter stemming algorithm online at http://text-processing.com/demo/stem/ stem fried to fri and not fry ? I can't recall any words ending with ied past tense in English that have a nominative form ending with i . Is this a bug? 回答1: A stem as returned by Porter Stemmer is not necessarily the base form of a verb, or a valid word at all. If you're looking for that, you need to look for a lemmatizer instead. 回答2: Firstly, a stemmer is not a lemmatizer, see also Stemmers vs

I want a Java Arabic stemmer

感情迁移 提交于 2019-12-21 01:46:05
问题 I'm looking for a Java stemmer for Arabic. I found a lib called "AraMorph" , but its output is uncontrollable and it makes formation to words which is unwanted. Is there any other stemmer for Arabic ? 回答1: Here is new Arabic stemmer: Assem's Arabic light stemmer coded using Snowball framework and generated to many languages including Java. You can use it by downloading libstemmer for Java here. 回答2: You can find Kohja stemmer here: http://zeus.cs.pacificu.edu/shereen/research.htm Direct

Is there a java implementation of Porter2 stemmer

帅比萌擦擦* 提交于 2019-12-17 16:18:09
问题 Do you know any java implementation of the Porter2 stemmer(or any better stemmer written in java)? I know that there is a java version of Porter(not Porter2) here : http://tartarus.org/~martin/PorterStemmer/java.txt but on http://tartarus.org/~martin/PorterStemmer/ the author mentions that the Porter is bit outdated and recommends to use Porter2, available at http://snowball.tartarus.org/algorithms/english/stemmer.html However, the problem with me is that this Porter2 is written in snowball(I