bioinformatics | 易学教程

Execute an external BLAST program in PHP

阅读更多关于 Execute an external BLAST program in PHP

I want to execute a blastx search application in PHP instead of Linux console text terminal. The actual command line arguments would be ( see definition of refer ): ./blastx -query $input -db ${Sbjct}_db -evalue 0.0001 -outfmt 6 -out /path/to/output.tsv Here's my PHP partial code. exec(' /path/to/blastx -query /path/to/PAO1.fasta -db /path/to/VFDB_setB_pro -evalue 0.0001 -outfmt 6 -out /path/to/output.tsv '); However, when I call exec() function in PHP program there is nothing happened. I also tried another method. It return error code 1. Here is my php exec() content: exec('sh /path/to

Need help with peak signal detection in Perl

阅读更多关于 Need help with peak signal detection in Perl

问题 Hi everyone I have some values of intensities from images of yeast colony plates. I need to be able to find the peak values from the intensity values. Below is an example image showing how the values look when graphed. Example of some of the values 5.7 5.3 8.2 16.5 34.2 58.8 **75.4** 75 65.9 62.6 58.6 66.4 71.4 53.5 40.5 26.8 14.2 8.6 5.9 7.7 14.9 30.5 49.9 69.1 **75.3** 69.8 58.8 57.2 56.3 67.1 69 45.1 27.6 13.4 8 5 These values show two peaks at 75.4 and 75.3, you can see that the values

How to extract chains from a PDB file?

阅读更多关于 How to extract chains from a PDB file?

问题 I would like to extract chains from pdb files. I have a file named pdb.txt which contains pdb IDs as shown below. The first four characters represent PDB IDs and last character is the chain IDs. 1B68A 1BZ4B 4FUTA I would like to 1) read the file line by line 2) download the atomic coordinates of each chain from the corresponding PDB files. 3) save the output to a folder. I used the following script to extract chains. But this code prints only A chains from pdb files. for i in 1B68 1BZ4 4FUT

Modify r object with rpy2

阅读更多关于 Modify r object with rpy2

I'm trying to use rpy2 to use the DESeq2 R/Bioconductor package in python. I actually solved my problem while writing my question (using do_slots allows access to the r objects attributes), but I think the example might be useful for others, so here is how I do in R and how this translates in python: In R I can create a "DESeqDataSet" from two data frames as follows: counts_data <- read.table("long/path/to/file", header=TRUE, row.names="gene") head(counts_data) ## WT_RT_1 WT_RT_2 prg1_RT_1 prg1_RT_2 ## aap-1 406 311 41 95 ## aat-1 5 8 2 0 ## aat-2 1 1 0 0 ## aat-3 13 12 0 1 ## aat-4 6 6 2 3 ##

Biopython: How to avoid particular amino acid sequences from a protein so as to plot Ramachandran plot?

阅读更多关于 Biopython: How to avoid particular amino acid sequences from a protein so as to plot Ramachandran plot?

I have written a python script to plot the 'Ramachandran Plot' of Ubiquitin protein. I am using biopython. I am working with pdb files. My script is as below : import Bio.PDB import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt phi_psi = ([0,0]) phi_psi = np.array(phi_psi) pdb1 ='/home/devanandt/Documents/VMD/1UBQ.pdb' for model in Bio.PDB.PDBParser().get_structure('1UBQ',pdb1) : for chain in model : polypeptides = Bio.PDB.PPBuilder().build_peptides(chain) for poly_index, poly in enumerate(polypeptides) : print "Model %s Chain %s" % (str(model.id), str(chain.id)), print

Remove rows from dataframe that contains only 0 or just a single 0

阅读更多关于 Remove rows from dataframe that contains only 0 or just a single 0

问题 I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns. Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary. I have tried to paste some of my data here with the results I want to obtain. unfiltered: ID GeneName DU145small DU145total PC3small PC3total 1 MIR22HG 33221.5 1224

R: How to change the column names in a data frame based on a specification

阅读更多关于 R: How to change the column names in a data frame based on a specification

I have a data frame, the start of it is below: SM_H1455 SM_V1456 SM_K1457 SM_X1461 SM_K1462 ENSG00000000419.8 290 270 314 364 240 ENSG00000000457.8 252 230 242 220 106 ENSG00000000460.11 154 158 162 136 64 ENSG00000000938.7 20106 18664 19764 15640 19024 ENSG00000000971.11 30 10 4 2 10 Note that there are many more cols and rows. Here's what I want to do: I want to change the name of the columns. The most important information in a column's name, e.g. SM_H1455, is the 4th character of the character string. In this case it's a H. What I want to do is to change the "SM" part to "Control" if the

How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid?

阅读更多关于 How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid?

问题 I have a list of taxids that looks like this: 1204725 2162 1300163 420247 I am looking to get a file with taxonomic ids in order from the taxids above: kingdom_id phylum_id class_id order_id family_id genus_id species_id I am using the package "ete3". I use the tool ete-ncbiquery that tells you the lineage from the ids above. (I run it from my linux laptop with the command below) ete3 ncbiquery --search 1204725 2162 13000163 420247 --info The result looks like this: # Taxid Sci.Name Rank

Rotate upper triangle of a ggplot tile heatmap

阅读更多关于 Rotate upper triangle of a ggplot tile heatmap

I've plotted a heat-map like this: ggplot(test, aes(start1, start2)) + geom_tile(aes(fill = logFC), colour = "gray", size=0.05) + scale_fill_gradientn(colours=c("#0000FF","white","#FF0000"), na.value="#DAD7D3") This plots the upper triangle of a heatmap. What i'd like to plot is the very same triangle, but having the hypotenuse as the x-axis . How would I do that? Edit: Added reproducible example library(ggplot2) # dummy data df1 <- mtcars[, c("gear","carb", "mpg")] # normal tile plot gg1 <- ggplot(df1, aes(gear, carb, fill = mpg)) + geom_tile() + xlim(c(1, 10)) + ylim(c(1, 10)) + theme_void()

Improving clojure lazy-seq usage for iterative text parsing

阅读更多关于 Improving clojure lazy-seq usage for iterative text parsing

I'm writing a Clojure implementation of this coding challenge , attempting to find the average length of sequence records in Fasta format: >1 GATCGA GTC >2 GCA >3 AAAAA For more background see this related StackOverflow post about an Erlang solution. My beginner Clojure attempt uses lazy-seq to attempt to read in the file one record at a time so it will scale to large files. However it is fairly memory hungry and slow, so I suspect that it's not implemented optimally. Here is a solution using the BioJava library to abstract out the parsing of the records: (import '(org.biojava.bio.seq.io