Approximate regular expression library for Java?

依然范特西╮ 提交于 2019-12-08 19:58:27
Gunslinger47

I found these answers elsewhere on this site for similar problems.

Commons Lang has an implementation of Levenshtein distance.
http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringUtils.html

Commons Codec has an implementation of soundex and metaphone.
http://commons.apache.org/codec/api-release/org/apache/commons/codec/language/Soundex.html
http://commons.apache.org/codec/api-release/org/apache/commons/codec/language/Metaphone.html

(source)

Lucene (http://lucene.apache.org/) also implements Levenshtein edit distance.

(source: zarawesome)

It so happens I reinvented this wheel many years ago - in a FORTRAN program on a mainframe :)

When I proudly told other people on the Internet about my algorithm, they laughed and pointed me at the two (four?) big names in this area:

These are algorithms for comparing huge sequences of similar strings. Memory requirement is about m + n, where m and n are the sizes of the strings, and runtime is about m * n.

Gunslinger47 mentions Levenshtein, Soundex and Metaphone. Levenshtein is also a powerful means of computing string distances, but it's better suited for "normal" text. Soundex and Metaphone compute a short string intended to encode the sound of the string when spoken by a human... they become ineffective after about 3 syllables, they're really intended for words in human language rather than strings of genomes or such.

EDIT

Oops, I just found my 4 big names at the bottom of the article you cited. So you're already aware of them. I think that if you search for those names and "Java" should find you implementations. Here's the first one I found.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!