bioinformatics

How to randomly extract FASTA sequences using Python?

阅读更多关于 How to randomly extract FASTA sequences using Python?

问题 I have the following sequences which is in a fasta format with sequence header and its nucleotides. How can I randomly extract the sequences. For example I would like to randomly select 2 sequences out of the total sequences. There are tools provided to do so is to extract according to percentage but not the number of sequences. Can anyone help me? A.fasta >chr1:1310706-1310726 GACGGTTTCCGGTTAGTGGAA >chr1:901959-901979 GAGGGCTTTCTGGAGAAGGAG >chr1:983001-983021 GTCCGCTTGCGGGACCTGGGG >chr1

Why is Collections.counter so slow?

阅读更多关于 Why is Collections.counter so slow?

问题 I'm trying to solve a Rosalind basic problem of counting nucleotides in a given sequence, and returning the results in a list. For those ones not familiar with bioinformatics it's just counting the number of occurrences of 4 different characters ('A','C','G','T') inside a string. I expected collections.Counter to be the fastest method (first because they claim to be high-performance, and second because I saw a lot of people using it for this specific problem). But to my surprise this method

Why can't python find some modules when I'm running CGI scripts from the web?

阅读更多关于 Why can't python find some modules when I'm running CGI scripts from the web?

问题 I have no idea what could be the problem here: I have some modules from Biopython which I can import easily when using the interactive prompt or executing python scripts via the command-line. The problem is, when I try and import the same biopython modules in a web-executable cgi script, I get a \"Import Error\" : No module named Bio Any ideas here? 回答1: Here are a couple of possibilities: Apache (on Unix) generally runs as a different user, and with a different environment, to python from

Find the intersection of overlapping ranges in two tables using data.table function foverlaps

阅读更多关于 Find the intersection of overlapping ranges in two tables using data.table function foverlaps

问题 I would like to use foverlaps to find the intersecting ranges of two bed files, and collapse any rows containing overlapping ranges into a single row. In the example below I have two tables with genomic ranges. The tables are called \"bed\" files that have zero-based start coordinates and one-based ending positions of features in chromosomes. For example, START=9, STOP=20 is interpreted to span bases 10 through 20, inclusive. These bed files can contain millions of rows. The solution would

Dictionary style replace multiple items

阅读更多关于 Dictionary style replace multiple items

问题 I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages. Currently I am going about it like so: foo <- data.frame(snp1 = c(\"AA\", \"AG\", \"AA\", \"AA\"), snp2 = c(\"AA\", \"AT\", \"AG\", \"AA\"), snp3 = c(NA, \"GG\", \"GG\", \"GC\"), stringsAsFactors=FALSE) foo <- replace(foo, foo == \"AA\", \"0101\") foo <- replace(foo, foo == \"AC\", \"0102\") foo <- replace(foo, foo == \"AG\", \"0103\") This works fine, but it is

Remove part of string after “.”

阅读更多关于 Remove part of string after “.”

I am working with NCBI Reference Sequence accession numbers like variable a : a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2") To get information from the biomart package I need to remove the .1 , .2 etc. after the accession numbers. I normally do this with this code: b <- sub("..*", "", a) # [1] "" "" "" "" "" "" But as you can see, this isn't the correct way for this variable. Can anyone help me with this? You just need to escape the period: a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2") gsub("\\.