bioinformatics

ImportError: cannot import name _aligners [biopython]

情到浓时终转凉″ 提交于 2019-12-10 18:55:50
问题 I am doing bioinformatics that has biopython dependency. Biopython always give me the following error: I hope someone could help me with this issue. Thank you! 回答1: This can occur on Biopython version >= 1.72 and has been discussed on the biopython mailing list here. This error occurs when you try and import while inside the biopython/ directory, to fix the error simply move to another directory outside the source tree and then execute your code. If the error still occurs then likely the

Find nucleotides in DNA sequence with perl

泪湿孤枕 提交于 2019-12-10 18:37:36
问题 I have the sequence DNA and I want to find nucleotide of the sequence at the position which was chosed by people. Below is the example: Enter the sequence DNA: ACTAAAAATACAAAAATTAGCCAGGCGTGGTGGCAC (the length of sequence is 33) Enter the position: (12) I hope the result is the position number 12 the nucleotides are AAA. I have no problem finding the amino acid of the position. Below is the current code I have. print "ENTER THE FILENAME OF THE DNA SEQUENCE:= "; $DNAfilename = <STDIN>; chomp

How to combine intervals data into fewer intervals in R?

限于喜欢 提交于 2019-12-10 18:33:51
问题 I am trying to collapse a series of intervals into fewer, equally meaningful intervals. Consider for example this list of intervals Intervals = list( c(23,34), c(45,48), c(31,35), c(7,16), c(5,9), c(56,57), c(55,58) ) Because the intervals overlap, the same intervals can be described with few vectors. Plotting these intervals make obvious that a list of 4 vectors would be enough plot(1,1,type="n",xlim=range(unlist(Intervals)),ylim=c(0.9,1.1)) segments( x0=sapply(Intervals,"[",1), x1=sapply

perl Script to search for a motif in a multifasta file and print the complete sequence along with the header line

守給你的承諾、 提交于 2019-12-10 17:32:30
问题 I am able to search a motif in a multi fasta file and print the line containing the motif.... but i need to print all the sequences along with the header line of the motif containing fasta sequence. Please help me i am just a beginner in perl #!usr/bin/perl -w use strict; print STDOUT "Enter the motif: "; my $motif = <STDIN>; chomp $motif; my $line; open (FILE, "data.fa"); while ($line = <FILE>) { if ($line =~ /$motif/) { print $line; } } 回答1: Try this: Bio::DB::Fasta Instructions on the page

Rename list of lists using a named list

China☆狼群 提交于 2019-12-10 17:26:19
问题 So I'm working with a list that contains other lists inside, with this structure: library(graph) library(RBGL) library(Rgraphviz) show(tree) $`SO:0001968` $`SO:0001968`$`SO:0001622` $`SO:0001968`$`SO:0001622`$`SO:0001624` $`SO:0001968`$`SO:0001622`$`SO:0001624`$`SO:0002090` [1] 1 $`SO:0001968`$`SO:0001622`$`SO:0001623` $`SO:0001968`$`SO:0001622`$`SO:0001623`$`SO:0002091` [1] 1 $`SO:0001968`$`SO:0001969` $`SO:0001968`$`SO:0001969`$`SO:0002090` [1] 1 $`SO:0001968`$`SO:0001969`$`SO:0002091` [1]

Checking if value in vector is in range of values in different length vector [duplicate]

十年热恋 提交于 2019-12-10 17:25:39
问题 This question already has answers here : Overlap join with start and end positions (3 answers) Closed 2 years ago . So I'm working in R and have a large dataframe that contains a vector that has genome positions like such: 2655180 2657176 2658869 And a second dataframe that has a a range of positions and a gene like such: chr1 100088228 100162167 AGL chr1 107599438 107600565 PRMT6 chr1 115215635 115238091 AMPD1 chr1 11850637 11863073 MTHFR chr1 119958143 119965343 HSD3B2 chr1 144124628

Can Biopython perform Seq.find() accounting for ambiguity codes

情到浓时终转凉″ 提交于 2019-12-10 13:58:11
问题 I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true: from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA amb = IUPACAmbiguousDNA() s1 = Seq("GGAAAAGG", amb) s2 = Seq("ARAA", amb) # R = A or G print s1.find(s2) If ambiguity codes were taken into account, the answer should be >>> 2 But the answer i get is that no match is found, or >>> -1 Looking at the biopython source code, it

Changing the x-axis of seqlogo figures in MATLAB

牧云@^-^@ 提交于 2019-12-10 13:41:54
问题 I'm making a large number of seqlogos programmatically. They are hundreds of columns wide and so running a seqlogo normally creates letters that are too thin to see. I've noticed that I only care about a few of these columns (not necessarily consecutive columns) ... most are noise but some are highly conserved. I use something like this snippet: wide_seqs = cell2mat(arrayfun(@randseq, repmat(200, [500 1]), 'uniformoutput', false)); wide_seqs(:, [17,30, 55,70,130]) = repmat(['ATCGG'], [500 1])

'StringCut' to the left or right of a defined position using Mathematica

三世轮回 提交于 2019-12-10 02:50:21
问题 On reading this question, I thought the following problem would be simple using StringSplit Given the following string, I want to 'cut' it to the left of every "D" such that: I get a List of fragments (with sequence unchanged) StringJoin @fragments gives back the original string (but is does not matter if I have to reorder the fragments to obtain this). That is, sequence within each fragment is important, and I do not want to lose any characters. (The example I am interested in is a protein

SeqIO.parse on a fasta.gz

…衆ロ難τιáo~ 提交于 2019-12-10 01:58:51
问题 New to coding. New to Pytho/biopython; this is my first question online, ever. How do I open a compressed fasta.gz file to extract info and perform calcuations in my function. Here is a simplified example of what I'm trying to do (I've tried different ways), and what the error is. The gzip command I'm using doesn't seem to work.? with gzip.open("practicezip.fasta.gz", "r") as handle: for record in SeqIO.parse(handle, "fasta"): print(record.id) Traceback (most recent call last): File "<ipython