porter-stemmer

Override method for a collection of classes implementing an interface

谁说胖子不能爱 提交于 2019-12-13 15:26:15
问题 I am using scikit-learn and am building a pipeline. Once the pipeline is build, I am using GridSearchCV to find the optimal model. I am working with text data, so I am experimenting with different stemmers. I have created a class called Preprocessor that takes a stemmer and vectorizer class, then attempts to override the vectorizer's method build_analyzer to incorporate the given stemmer. However, I see that GridSearchCV's set_params just directly accesses instance variables -- i.e. it will

porter stemming algorithm implementation question?

我怕爱的太早我们不能终老 提交于 2019-12-11 05:36:20
问题 I am trying to implement porter stemming algorithm but i am stuck at this point: Step 1b (m>0) EED -> EE feed -> feed agreed -> agree (*v*) ED -> plastered -> plaster bled -> bled (*v*) ING -> motoring -> motor sing -> sing Isn't the m of feed equal 1? feed >> [c]vvc[] >>[c]vc[]. If it was so why didn't he convert feed to fee i know it is wrong ,can any one clear that up? you can check the original algorithim here http://tartarus.org/~martin/PorterStemmer/def.txt thanks 回答1: m of 'feed' is

Porters Stemming Algorithm Javascript, How to

隐身守侯 提交于 2019-12-08 01:32:11
问题 Below is Porters Stemming Algorithm for JavaScript which I have taken from here: http://tartarus.org/~martin/PorterStemmer/js.txt I would like to be able to use the algorithm by simply calling: var stemmed_word = porter_stemming_alg( "some_word_to_stem" ); Does anyone have any suggestions as to how I would incorporate this functionality into the code below?? Something along the lines of: function porter_stemming_alg( word ){ //... // algorithm goes here... //... } Any suggestions appreciated

Python stemmer issue: wrong stem

两盒软妹~` 提交于 2019-12-04 06:56:05
问题 Hi i'm trying to stem words with a python stemmer, i tried Porter and Lancaster, but they have the same problem. They can't stem correclty words that end with "er" or "e". for example, they stem computer --> comput rotate --> rotat this is a part of the code line=line.lower() line=re.sub(r'[^a-z0-9 ]',' ',line) line=line.split() line=[x for x in line if x not in stops] line=[ porter.stem(word, 0, len(word)-1) for word in line] # or 'line=[ st.stem(word) for word in line]' return line any idea

Stop words and stemmer in java

ⅰ亾dé卋堺 提交于 2019-12-03 13:56:20
问题 I'm thinking of putting a stop words in my similarity program and then a stemmer (going for porters 1 or 2 depends on what easiest to implement) I was wondering that since I read my text from files as whole lines and save them as a long string, so if I got two strings ex. String one = "I decided buy something from the shop."; String two = "Nevertheless I decidedly bought something from a shop."; Now that I got those strings Stemming: Can I just use the stemmer algoritmen directly on it, save

nltk stemmer: string index out of range

此生再无相见时 提交于 2019-12-03 11:38:13
问题 I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer . For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django view, I receive an IndexError: string index out of range exception from PorterStemmer().stem() for the string 'oed' . As a result, running the following: # xkcd_project/search/views.py from nltk.stem.porter import PorterStemmer def get_results

I want a Java Arabic stemmer

。_饼干妹妹 提交于 2019-12-03 06:50:29
I'm looking for a Java stemmer for Arabic. I found a lib called "AraMorph" , but its output is uncontrollable and it makes formation to words which is unwanted. Is there any other stemmer for Arabic ? Here is new Arabic stemmer: Assem's Arabic light stemmer coded using Snowball framework and generated to many languages including Java. You can use it by downloading libstemmer for Java here . You can find Kohja stemmer here: http://zeus.cs.pacificu.edu/shereen/research.htm Direct download: http://zeus.cs.pacificu.edu/shereen/ArabicStemmerCode.zip https://sourceforge.net/projects/arabicstemmer/

Stop words and stemmer in java

纵饮孤独 提交于 2019-12-03 04:49:25
I'm thinking of putting a stop words in my similarity program and then a stemmer (going for porters 1 or 2 depends on what easiest to implement) I was wondering that since I read my text from files as whole lines and save them as a long string, so if I got two strings ex. String one = "I decided buy something from the shop."; String two = "Nevertheless I decidedly bought something from a shop."; Now that I got those strings Stemming: Can I just use the stemmer algoritmen directly on it, save it as a String and then continue working on the similarity like I did before implementing the stemmer

nltk stemmer: string index out of range

回眸只為那壹抹淺笑 提交于 2019-12-03 01:58:06
I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer . For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django view, I receive an IndexError: string index out of range exception from PorterStemmer().stem() for the string 'oed' . As a result, running the following: # xkcd_project/search/views.py from nltk.stem.porter import PorterStemmer def get_results(request): s = PorterStemmer() s.stem('oed') return render(request, 'list.html') raises the mentioned error:

StandardAnalyzer with stemming

夙愿已清 提交于 2019-11-30 23:26:05
Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would like not to consider numbers, how can I achieve that? Thanks ameertawfik If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer . Otherwise, you could create a new Analyzer that extends the AnalyzerWraper as shown below. import java.io.IOException; import java.io.StringReader; import java.util