porter-stemmer

Lucene Porter Stemmer not public

瘦欲@ 提交于 2019-11-29 11:31:30
How to use the Porter Stemmber class in Lucene 3.6.2? Here is what I have: import org.apache.lucene.analysis.PorterStemmer; ... PorterStemmer stemmer = new PorterStemmer(); term = stemmer.stem(term); I am being told: PorterStemmer is not public in org.apache.lucene.analysis; cannot be accessed from outside package. Edit: I also read extensively about using Snowball, but it isn't encouraged. What is the right way to stem using Lucene in Java?? phani 1) If you want to use PorterStemmer as part of Lucene token analysis process, use PorterStemFilter Sample code class MyAnalyzer extends Analyzer {

Lucene Porter Stemmer not public

巧了我就是萌 提交于 2019-11-28 04:45:05
问题 How to use the Porter Stemmber class in Lucene 3.6.2? Here is what I have: import org.apache.lucene.analysis.PorterStemmer; ... PorterStemmer stemmer = new PorterStemmer(); term = stemmer.stem(term); I am being told: PorterStemmer is not public in org.apache.lucene.analysis; cannot be accessed from outside package. Edit: I also read extensively about using Snowball, but it isn't encouraged. What is the right way to stem using Lucene in Java?? 回答1: 1) If you want to use PorterStemmer as part

Is there a java implementation of Porter2 stemmer

痞子三分冷 提交于 2019-11-27 21:31:30
Do you know any java implementation of the Porter2 stemmer(or any better stemmer written in java)? I know that there is a java version of Porter(not Porter2) here : http://tartarus.org/~martin/PorterStemmer/java.txt but on http://tartarus.org/~martin/PorterStemmer/ the author mentions that the Porter is bit outdated and recommends to use Porter2, available at http://snowball.tartarus.org/algorithms/english/stemmer.html However, the problem with me is that this Porter2 is written in snowball(I never heard of it before, so don't know anything about it). What I am exactly looking for is a java

Stemming algorithm that produces real words

纵然是瞬间 提交于 2019-11-27 16:53:52
I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an implementation of Porter Stemmer algorithm (I'm writing in PHP by the way): http://tartarus.org/~martin/PorterStemmer/php.txt This works, up to a point, but doesn't return "real" words. The example above is stemmed to "commun". I've tried "Snowball" (suggested within another Stack Overflow thread). http://snowball.tartarus.org/demo.php For my example

Stemming English words with Lucene

孤人 提交于 2019-11-27 11:22:26
I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit". The function looks like: String stemTerm(String term){ ... } I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer business... EDIT : I actually need a stemming + lemmatization. Can Lucene do this? import org.apache.lucene

Stemming algorithm that produces real words

醉酒当歌 提交于 2019-11-27 04:10:26
问题 I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an implementation of Porter Stemmer algorithm (I'm writing in PHP by the way): http://tartarus.org/~martin/PorterStemmer/php.txt This works, up to a point, but doesn't return "real" words. The example above is stemmed to "commun". I've tried "Snowball"