sequence-alignment

Alignment of multiple (non-biological, discrete state) sequences

大兔子大兔子 提交于 2021-02-18 17:53:51
问题 I have some data that describes an ordered set of discrete events (or states). There are 34 possible states, which may occur in any order and may repeat. Each sequence of events can contain any number of events, and crucially there are more than 2 sequences of events. My eventual aim is to cluster these sequences into similar subsets, but my hunch is that this cannot be meaningful unless these sequences are aligned such that equivalent events occupy the same position in all sequences. I'm

BioPython AlignIO ValueError says strings must be same length?

安稳与你 提交于 2021-02-10 11:30:46
问题 Input fasta-format text file: http://www.jcvi.org/cgi-bin/tigrfams/DownloadFile.cgi?file=/opt/www/www_tmp/tigrfams/fa_alignment_PF00205.txt #!/usr/bin/python from Bio import AlignIO seq_file = open('/path/to/fa_alignment_PF00205.txt') alignment = AlignIO.read(seq_file, "fasta") Error: ValueError: Sequences must all be the same length The input sequences shouldn't have to be the same length since on ClustalOmega you can align sequences of differing lengths. This also doesn't work...gets the

BioPython AlignIO ValueError says strings must be same length?

北城余情 提交于 2021-02-10 11:28:17
问题 Input fasta-format text file: http://www.jcvi.org/cgi-bin/tigrfams/DownloadFile.cgi?file=/opt/www/www_tmp/tigrfams/fa_alignment_PF00205.txt #!/usr/bin/python from Bio import AlignIO seq_file = open('/path/to/fa_alignment_PF00205.txt') alignment = AlignIO.read(seq_file, "fasta") Error: ValueError: Sequences must all be the same length The input sequences shouldn't have to be the same length since on ClustalOmega you can align sequences of differing lengths. This also doesn't work...gets the

Sequence Alignment Algorithm with a group of characters instead of one character

徘徊边缘 提交于 2020-01-14 13:28:09
问题 Summary: I'm beginning with some details about alignment algorithms, and at the end, I ask my question. If you know about alignment algorithm pass the beginning. Consider we have two strings like: ACCGAATCGA ACCGGTATTAAC There is some algorithms like: Smith-Waterman Or Needleman–Wunsch, that align this two sequence and create a matrix. take a look at the result in the following section: Smith-Waterman Matrix § § A C C G A A T C G A § 0 0 0 0 0 0 0 0 0 0 0 A 0 4 0 0 0 4 4 0 0 0 4 C 0 0 13 9 4

How do I group similar strings in R? [closed]

送分小仙女□ 提交于 2019-12-21 23:56:16
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I have a database with ~5,000 locality names, most of which are repetitions with typos, permutations, abreviations, etc. I would like to group them by similarity, to speed up further processing. The best would be to convert each variation into a "platonic form", and put two columns side by side, with the

Looping across 10 columns at a time in R

喜夏-厌秋 提交于 2019-12-10 10:45:53
问题 I have a dataframe with 1000 columns. I am trying to loop over 10 columns at a time and use the seqdef() function from the TraMineR package to do sequence alignment across the data in those columns. Hence, I want to apply this function to columns 1-10 in the first go-around, and columns 11-20 in the second go-around, all the way up to 1000. This is the code I am using. library(TraMineR) by(df[, 1:10], seqdef(df)) However, this only loops over the first 10 and then stops. How do I loop it

How do I group similar strings in R? [closed]

孤人 提交于 2019-12-04 18:47:16
I have a database with ~5,000 locality names, most of which are repetitions with typos, permutations, abreviations, etc. I would like to group them by similarity, to speed up further processing. The best would be to convert each variation into a "platonic form", and put two columns side by side, with the original and platonic forms. I've read about Multiple sequence alignment , but this seems to be mostly used in bioinformatics, for sequences of DNA/RNA/Peptides. I'm not sure it will work well with names of places. Anyone knows of a library that helps me to do it in R? Or which of the many

Non-biological sequence alignment tool

荒凉一梦 提交于 2019-12-04 07:36:38
Are there any tools/libraries for aligning sequences of arbitrarily large alphabets? Almost all of the sequence alignment tools in the market are focused on biological sequences (nucleotides or peptides). In my case, however, sequences are composed of hundreds of distinct elements and they cannot be encoded as ASCII strings. So, I need a tool or a library that can align, simply, two (or more) integer arrays. I couldn't find such a tool or library, so I implemented my own Python library for generic sequence alignment. It is open-source: https://github.com/eseraygun/python-alignment You can also

Traceback in Smith-Wateman algorithm with affine gap penalty

百般思念 提交于 2019-12-03 12:37:07
问题 I'm trying to implement the Smith-Waterman algorithm for local sequence alignment using the affine gap penalty function. I think I understand how to initiate and compute the matrices required for calculating alignment scores, but am clueless as to how to then traceback to find the alignment. To generate the 3 matrices required I have the following code for j in range(1, len2): for i in range(1, len1): fxOpen = F[i][j-1] + gap xExtend = Ix[i][j-1] + extend Ix[i][j] = max(fxOpen, xExtend)

Traceback in Smith-Wateman algorithm with affine gap penalty

≡放荡痞女 提交于 2019-12-03 03:06:00
I'm trying to implement the Smith-Waterman algorithm for local sequence alignment using the affine gap penalty function. I think I understand how to initiate and compute the matrices required for calculating alignment scores, but am clueless as to how to then traceback to find the alignment. To generate the 3 matrices required I have the following code for j in range(1, len2): for i in range(1, len1): fxOpen = F[i][j-1] + gap xExtend = Ix[i][j-1] + extend Ix[i][j] = max(fxOpen, xExtend) fyOpen = F[i-1][j] + gap yExtend = Iy[i-1][j] + extend Iy[i][j] = max(fyOpen, yExtend) matchScore = (F[i-1]