suffix-tree

Longest Non-Overlapping Repeated Substring using Suffix Tree/Array (Algorithm Only)

て烟熏妆下的殇ゞ 提交于 2019-12-04 17:56:53
问题 I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available. When overlapping is allowed, the answer is trivial (deepest parent node in suffix tree). For example for String = "acaca" If overlapping is allowed, the answer is "aca" but when overlapping is not allowed, the answer is "ac" or "ca". I need the algorithm or high level idea only. P.S.: I tried but there is no clear answer I can find on web. 回答1: Generate

Suffix tree library for c++ with simple examples how to use it

时光毁灭记忆、已成空白 提交于 2019-12-04 11:52:25
I'm searching for suffix tree library (that has linear time construction), and all I found is PATL, but PATL has no documentation and I can't figure out any of the examples. So is there a suffix tree library for c++ that has a decent documentation? PATL home : http://code.google.com/p/patl/ EDIT: Motivation: I need to process large amount of strings and find the frequent common substrings, and report if more than n occurrences of any substring occurred within t seconds. I implemented a tree (with counter in the nodes, actually it isn't a counter but an std::vector of visit times since like I

Short, Java implementation of a suffix tree and usage?

末鹿安然 提交于 2019-12-04 09:30:59
问题 I'm looking for a short, simple suffix tree building/usage algorithm in Java. The best I've found so far lies withing the Semantic Discovery Toolkit, but the implementation is several thousand lines long and spans several classes. Ideally, the implementation would be as short as possible and span no more than a few hundred lines. Does anyone have such an implementation? 回答1: I just finished a Java implementation of a suffix tree. In my blog entry you can find out more about suffix trees, see

Ukkonen's algorithm for Generalized Suffix Trees

ぐ巨炮叔叔 提交于 2019-12-03 19:09:22
问题 I am currently working on my own Suffix Tree implementation (using C++, but the question remains language agnostic). I studied the original paper from Ukkonen. The article is very clear so I got to work on my implementation and tried to tackle the problem for Generalized Suffix Trees. In the tree, each substring leading from a node to another is represented using a pair of integer. While this is straightforward for a regular suffix tree, a problem arises when multiple strings coexist in the

Haskell Data Type With References

谁说我不能喝 提交于 2019-12-03 13:56:19
I'm implementing Ukkonen's algorithm, which requires that all leaves of a tree contain a reference to the same integer, and I'm doing it in Haskell to learn more about the language. However, I'm having a hard time writing out a data type that does this. -- Node has children, indexes of info on the edge -- to it, and an optional suffix link. -- Leaf has a beginning index of the info, but the -- end index is always an incrementing variable index. data STree = Node [STree] (Int, Int) (Maybe STree) | Leaf (Int, ??? ) How can I put the reference in the Leaf type declaration? 来源: https:/

Longest Non-Overlapping Repeated Substring using Suffix Tree/Array (Algorithm Only)

∥☆過路亽.° 提交于 2019-12-03 12:30:11
I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available. When overlapping is allowed, the answer is trivial (deepest parent node in suffix tree). For example for String = "acaca" If overlapping is allowed, the answer is "aca" but when overlapping is not allowed, the answer is "ac" or "ca". I need the algorithm or high level idea only. P.S.: I tried but there is no clear answer I can find on web. Generate suffix array and sort in O(nlogn).ps: There is more effective algorithm like DC3 and Ukkonen algorithm.

How to call module written with argparse in iPython notebook

China☆狼群 提交于 2019-12-03 10:49:50
问题 I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have never had to deal directly with argparse before. How can I use this without rewriting main()? By the by, this writeup of Ukkonen's algorithm is fantastic. 回答1: I've had a similar problem before, but using optparse instead of argparse . You don't need to change anything in the original script, just

Short, Java implementation of a suffix tree and usage?

北城以北 提交于 2019-12-03 04:02:44
I'm looking for a short, simple suffix tree building/usage algorithm in Java. The best I've found so far lies withing the Semantic Discovery Toolkit, but the implementation is several thousand lines long and spans several classes. Ideally, the implementation would be as short as possible and span no more than a few hundred lines. Does anyone have such an implementation? I just finished a Java implementation of a suffix tree. In my blog entry you can find out more about suffix trees, see how to use my library, as well as download and build the library using Subversion and Maven. Yes, it's

How to call module written with argparse in iPython notebook

南笙酒味 提交于 2019-12-03 01:20:12
I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have never had to deal directly with argparse before. How can I use this without rewriting main()? By the by, this writeup of Ukkonen's algorithm is fantastic . I've had a similar problem before, but using optparse instead of argparse . You don't need to change anything in the original script, just assign a new list to sys.argv like so: if __name__ == "__main__": from Bio import SeqIO path = '/path/to

Find longest repeating substring in string?

时光总嘲笑我的痴心妄想 提交于 2019-12-01 14:16:40
I came across below program which looks perfect. Per me its time complexity is nlogn where n is the length of String. n for storing different strings,nlog for sorting, n for comparison. So time complexity is nlogn. Space complexity is n for storing the storing n substrings My question is can it be further optimized ? public class LRS { // return the longest common prefix of s and t public static String lcp(String s, String t) { int n = Math.min(s.length(), t.length()); for (int i = 0; i < n; i++) { if (s.charAt(i) != t.charAt(i)) return s.substring(0, i); } return s.substring(0, n); } //