suffix-tree

Generalized Suffix Tree Java Implementation [closed]

让人想犯罪 __ 提交于 2019-12-17 10:46:26
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I am looking for a Java implementation of the Generalized Suffix Tree (GST) with the following features: After the creation of the GST from say 1000 strings I would like find out how many of these 1000 strings contains some other string 's'. The search must be quiet fast, as I need to apply the search on about

Find K-most longest common suffix in a set of strings

╄→尐↘猪︶ㄣ 提交于 2019-12-13 04:02:02
问题 I want to find most longest common suffix in a set of strings to detect some potential important morpheme in my natural language process project. Given frequency K>=2 ,find the K-most common longest suffix in a list of strings S1,S2,S3...SN To simplify the problem, here are some examples: Input1: K=2 S=["fireman","woman","businessman","policeman","businesswoman"] Output1: ["man","eman","woman"] Explanation1: "man" occur 4 times, "eman" occur 2 times,"woman" occur 2 times It would be

Ukkonen's suffix tree algorithm in plain English

时间秒杀一切 提交于 2019-12-12 11:06:40
问题 I feel a bit thick at this point. I've spent days trying to fully wrap my head around suffix tree construction, but because I don't have a mathematical background, many of the explanations elude me as they start to make excessive use of mathematical symbology. The closest to a good explanation that I've found is Fast String Searching With Suffix Trees , but he glosses over various points and some aspects of the algorithm remain unclear. A step-by-step explanation of this algorithm here on

Devel::CheckLib syntax errors when trying to install Tree::Suffix

我与影子孤独终老i 提交于 2019-12-11 23:24:55
问题 I'm trying to install the Tree::Suffix module from CPAN on a Debian testing system, with Perl 5.18.1 installed. During compliation, I get a bunch of syntax errors and warnings, related to Devel::CheckLib. CPAN.pm: Building G/GR/GRAY/Tree-Suffix-0.21.tar.gz syntax error at inc/Devel/CheckLib.pm line 164, near "$mm_attr_key qw(LIBS INC)" syntax error at inc/Devel/CheckLib.pm line 171, near "}" Global symbol "%args" requires explicit package name at inc/Devel/CheckLib.pm line 175. syntax error

Why do we need a sentinel character in a Suffix Tree?

一个人想着一个人 提交于 2019-12-10 15:32:58
问题 Why do we need to append "$" to the original string when we implement a suffix tree? 回答1: There can be special reasons for appending one (or even more) special characters to the end of the string when specific construction algorithms are used – both in the case of suffix trees and suffix arrays. However, the most fundamental underlying reason in the case of suffix trees is a combination of two properties of suffix trees: Suffix trees are PATRICIA trees, i.e. the edge labels are, unlike the

Stuck finding deepest path in general tree traversal trying to find largest common substring

▼魔方 西西 提交于 2019-12-07 15:31:08
问题 I am trying to solve the problem of largest common substring between 2 Strings. I will reduce my problem to the following: I created a general suffix tree and as per my understanding the largest common substring is the deepest path consisting of nodes that belongs to both strings. My test input is: String1 = xabc String2 = abc It seems that the tree I build is correct but my problem is the following method (I pass the root of the tree initially): private void getCommonSubstring(SuffixNode

Longest maximum repeating substring

穿精又带淫゛_ 提交于 2019-12-06 15:41:14
A substring can be of length 1,2,3... The question that I was trying to solve involved finding the substring that occurred the maximum number of times. So it basically broke down to finding the character having the maximum frequency. However, I found out that I can find the longest repeating substring using suffix tree in O(n). But, suffix tree returns the substring keeping the length as a priority. I wanted to find the substring which occurs the most number of times, and out of those substrings I want to find the longest one. For eg: In the following string: ABCZLMNABCZLMNABC A suffix tree

Longest palindromic substring and suffix trie

a 夏天 提交于 2019-12-06 13:44:32
I was Googling about a rather well-known problem, namely: the longest palindromic substring I have found links that recommend suffix tries as a good solution to the problem. Example SO and Algos The approach is (as I understand it) e.g. for a string S create Sr (which is S reversed) and then create a generalized suffix trie. Then find the longest common sustring of S and Sr which is the path from the root to the deepest node that belongs both to S and Sr . So the solution using the suffix tries approach essentially reduces to Find the longest common substring problem. My question is the

Suffix tree library for c++ with simple examples how to use it

血红的双手。 提交于 2019-12-06 07:22:49
问题 I'm searching for suffix tree library (that has linear time construction), and all I found is PATL, but PATL has no documentation and I can't figure out any of the examples. So is there a suffix tree library for c++ that has a decent documentation? PATL home : http://code.google.com/p/patl/ EDIT: Motivation: I need to process large amount of strings and find the frequent common substrings, and report if more than n occurrences of any substring occurred within t seconds. I implemented a tree

Haskell Data Type With References

会有一股神秘感。 提交于 2019-12-04 21:45:18
问题 I'm implementing Ukkonen's algorithm, which requires that all leaves of a tree contain a reference to the same integer, and I'm doing it in Haskell to learn more about the language. However, I'm having a hard time writing out a data type that does this. -- Node has children, indexes of info on the edge -- to it, and an optional suffix link. -- Leaf has a beginning index of the info, but the -- end index is always an incrementing variable index. data STree = Node [STree] (Int, Int) (Maybe