dna-sequence

Find length of overlap in strings [closed]

心不动则不痛 提交于 2021-02-16 13:40:33
问题 Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . Improve this question do you know any ready-to-use method to obtain length and also overlap of two strings? However only with R , maybe something from stringr ? I was looking here, unfortunately without succes. str1 <- 'ABCDE' str2 <- 'CDEFG' str_overlap(str1, str2) 'CDE' str_overlap

a query to generate 5 random DNA sequences that are each about 20 bases, [duplicate]

♀尐吖头ヾ 提交于 2020-07-10 08:46:09
问题 This question already has answers here : Postgresql:Generate Sequence (2 answers) Closed 14 days ago . I got this query to solve for the first 20 but I don’t know how to extend that to the 5 rows prepare dna_length(int) as with t1 as ( select chr(65) as s union select chr(67) union select chr(71) union select chr(84) ) , t2 as ( select s, row_number() over() as rn from t1) , t3 as ( select generate_series(1,$1) as i, round(random() * 4 + 0.5) as rn ) , t4 as ( select t2.s from t2 join t3 on

How to catch the longest sequence of a group

情到浓时终转凉″ 提交于 2020-07-05 12:34:20
问题 The task is to find the longest sequence of a group for instance, given DNA sequence: "AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC" and it has 7 occurrences of AGATC. (AGATC) matches all occurrences. Is it possible to write a regular expression that catches only the longest sequence, i.e. AGATCAGATCAGATCAGATCAGATC in the given text? If this is not possible only with regex, how can I iterate through each sequence (i.e. 1st sequence is AGATCAGATC , 2nd -

Processing a sub-list of variable size within a larger list

Deadly 提交于 2020-01-14 03:05:50
问题 I'm a biological engineering PhD student here trying to self-learn Python programming for use in automating a part of my research, but I've ran into a problem with processing sub-lists within a bigger list that I can't seem to solve. Basically, the goal of what I'm trying to do is write a small script that will process a CSV file containing a list of plasmid sequences that I'm building using various DNA assembly methods, and then spit out the primer sequences that I need to order in order to

Reverse complement of DNA strand using Python

懵懂的女人 提交于 2020-01-10 19:44:08
问题 I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code: def complement(seq): complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} bases = list(seq) bases = [complement[base] for base in bases] return '

exec() not returning process ID

时间秒杀一切 提交于 2020-01-05 07:36:42
问题 I'm using the PHP exec() function to execute the Canu assembler programs, and I want to get its process ID within the same script. The problem is exec() not returning any PID, even the process is running successfully. The processes are started like this: $gnuplot_path = '/usr/bin/gnuplot'; $command = 'nohup canu -d . -p E.coli gnuplot='.$gnuplot_path.' genomeSize=4.8m useGrid=false maxThreads=30 -pacbio-raw /path/to/p6.25x.fastq > /path/to/process.err 2>&1 &'; Currently, I try to determine if

exec() not returning process ID

☆樱花仙子☆ 提交于 2020-01-05 07:34:57
问题 I'm using the PHP exec() function to execute the Canu assembler programs, and I want to get its process ID within the same script. The problem is exec() not returning any PID, even the process is running successfully. The processes are started like this: $gnuplot_path = '/usr/bin/gnuplot'; $command = 'nohup canu -d . -p E.coli gnuplot='.$gnuplot_path.' genomeSize=4.8m useGrid=false maxThreads=30 -pacbio-raw /path/to/p6.25x.fastq > /path/to/process.err 2>&1 &'; Currently, I try to determine if

Python regex module fuzzy match: substitution count not as expected

喜你入骨 提交于 2020-01-04 18:42:32
问题 Background The Python module regex allows fuzzy matching. You can specify the allowable number of substitutions (s), insertions (i), deletions (d), and total errors (e). The fuzzy_counts property of a match result returns a tuple (0,0,0), where: match.fuzzy_counts[0] = count for 's' match.fuzzy_counts[1] = count for 'i' match.fuzzy_counts[2] = count for 'd' Problem The deletions and insertions are counted as expected, but not the substitutions. In the example below, the only change is a

Four-Gamete-Test in R

China☆狼群 提交于 2019-12-24 12:08:11
问题 I have (will have) data, that looks like the following: Individual Nuk Name Position Individual.1 Nuk.1 Name.1 Position.1 Ind 1 A Locus_1988 23 Ind 1 A Locus_3333 15 Ind 2 A Locus_1988 23 Ind 2 G Locus_3333 15 Ind 3 G Locus_1988 23 Ind 3 A Locus_3333 15 Ind 4 G Locus_1988 23 Ind 4 - Locus_3333 15 Ind 5 A Locus_1988 23 Ind 5 G Locus_3333 15 Ind 6 G Locus_1988 23 Ind 6 G Locus_3333 15 Ind 1 C Locus_1988 23 Ind 1 C Locus_3333 18 Ind 2 T Locus_1988 23 Ind 2 C Locus_3333 18 Ind 3 T Locus_1988 23

How to find specific frequency of a codon?

痞子三分冷 提交于 2019-12-23 21:16:49
问题 I am trying to make a function in R which could calculate the frequency of each codon. We know that methionine is an amino acid which could be formed by only one set of codon ATG so its percentage in every set of sequence is 1. Where as Glycine could be formed by GGT, GGC, GGA, GGG hence the percentage of occurring of each codon will be 0.25. The input would be in a DNA sequence like-ATGGGTGGCGGAGGG and with the help of codon table it could calculate the percentage of each occurrence in an