dna-sequence

Reverse complement of DNA strand using Python

感情迁移 提交于 2019-11-30 14:20:25
I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code: def complement(seq): complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} bases = list(seq) bases = [complement[base] for base in bases] return ''.join(bases) def reverse_complement(s): return complement(s[::-1]) print "Reverse Complement:" print

how to match dna sequence pattern

大兔子大兔子 提交于 2019-11-28 19:36:17
I am getting a trouble finding an approach to solve this problem. Input-output sequences are as follows : **input1 :** aaagctgctagag **output1 :** a3gct2ag2 **input2 :** aaaaaaagctaagctaag **output2 :** a6agcta2ag Input nsequence can be of 10^6 characters and largest continuous patterns will be considered. For example for input2 "agctaagcta" output will not be "agcta2gcta" but it will be "agcta2". Any help appreciated. Explanation of the algorithm: Having a sequence S with symbols s(1), s(2),…, s(N). Let B(i) be the best compressed sequence with elements s(1), s(2),…,s(i). So, for example, B(3

How to plot Pie charts in haploNet Haplotype Networks {pegas}

怎甘沉沦 提交于 2019-11-28 09:32:26
I'm trying to use haploNet function of {pegas} to plot a haplotype network, but i`m having trouble putting equal haplotypes from different populations in a same piechart. I can build a haplotype net with the following script: x <- read.dna(file="x.fas",format="fasta") h <- haplotype(x) net <- haploNet(h) plot(net) I'd like to set in the dnabin data the label of the original population of each taxa, so i could have piecharts of different colors (of haplotypes from different populations) in the resulting network. I'd like also to remove overlapping circles in the resulting haplotype network.

how to match dna sequence pattern

帅比萌擦擦* 提交于 2019-11-27 12:25:20
问题 I am getting a trouble finding an approach to solve this problem. Input-output sequences are as follows : **input1 :** aaagctgctagag **output1 :** a3gct2ag2 **input2 :** aaaaaaagctaagctaag **output2 :** a6agcta2ag Input nsequence can be of 10^6 characters and largest continuous patterns will be considered. For example for input2 "agctaagcta" output will not be "agcta2gcta" but it will be "agcta2". Any help appreciated. 回答1: Explanation of the algorithm: Having a sequence S with symbols s(1),

Search for string allowing for one mismatch in any location of the string

佐手、 提交于 2019-11-27 12:06:40
I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasite). I am not sure how large the genome is, but much longer than 230,000 sequences. I need to look for each of my sequences of 25 characters, for example, (AGCCTCCCATGATTGAACAGATCAT). The genome is formatted as a continuous string, i.e. (CATGGGAGGCTTGCGGAGCCTGAGGGCGGAGCCTGAGGTGGGAGGCTTGCGGAGTGCGGAGCCTGAGCCTGAGGGCGGAGCCTGAGGTGGGAGGCTT....) I don't care where or how many times it is found, only whether it is or not. This is

How to plot Pie charts in haploNet Haplotype Networks {pegas}

霸气de小男生 提交于 2019-11-27 02:59:58
问题 I'm trying to use haploNet function of {pegas} to plot a haplotype network, but i`m having trouble putting equal haplotypes from different populations in a same piechart. I can build a haplotype net with the following script: x <- read.dna(file="x.fas",format="fasta") h <- haplotype(x) net <- haploNet(h) plot(net) I'd like to set in the dnabin data the label of the original population of each taxa, so i could have piecharts of different colors (of haplotypes from different populations) in the

Valid characters in a String

百般思念 提交于 2019-11-26 23:46:46
问题 I am given a string and have to return False if there is one or more invalid characters, otherwise True. The caveat is that I can only built-in functions and str operations (for example: in, +, indexing, len) and recursion. What I have so far is not working: def is_valid_sequence(dna): """ (str) -> bool Return True if and only if the DNA sequence is valid (A, T, C, and G) :param dna: string sequence :return: true or false >>> is_valid_sequence('ATTAC') True >>> is_valid_sequence('FBSSS')

How to plot a gene graph for a DNA sequence say ATGCCGCTGCGC?

扶醉桌前 提交于 2019-11-26 22:35:21
问题 I need to generate a random walk based on the DNA sequence of a virus, given its base pair sequence of 2k base pairs. The sequence looks like "ATGCGTCGTAACGT". The path should turn right for an A, left for a T, go upwards for a G and downwards for a C. How can I use either Matlab, Mathematica or SPSS for this purpose? 回答1: I did not previously know of Mark McClure's blog about Chaos Game representation of gene sequences, but it reminded me of an article by Jose Manuel Gutiérrez (The

sequence logos in matplotlib: aligning xticks

柔情痞子 提交于 2019-11-26 16:48:11
问题 I am trying to draw sequence logos using matplotlib. The entire code is available on gist The relevant portion is: class Scale(matplotlib.patheffects.RendererBase): def __init__(self, sx, sy=None): self._sx = sx self._sy = sy def draw_path(self, renderer, gc, tpath, affine, rgbFace): affine = affine.identity().scale(self._sx, self._sy)+affine renderer.draw_path(gc, tpath, affine, rgbFace) def draw_logo(all_scores): fig = plt.figure() fig.set_size_inches(len(all_scores),2.5) ax = fig.add