bioinformatics

Execute an external BLAST program in PHP

痴心易碎 提交于 2019-12-22 13:59:14
问题 I want to execute a blastx search application in PHP instead of Linux console text terminal. The actual command line arguments would be (see definition of refer): ./blastx -query $input -db ${Sbjct}_db -evalue 0.0001 -outfmt 6 -out /path/to/output.tsv Here's my PHP partial code. exec(' /path/to/blastx -query /path/to/PAO1.fasta -db /path/to/VFDB_setB_pro -evalue 0.0001 -outfmt 6 -out /path/to/output.tsv '); However, when I call exec() function in PHP program there is nothing happened. I also

Traceback through a Matrix of Directions in R

我的梦境 提交于 2019-12-22 10:08:21
问题 I have a matrix like this: http://i.imgur.com/3HiEBm4.png You can load it like this: matrix = structure(c("-", "-", "C", "G", "C", "A", "-", "0", "V", "V", "V", "V", "C", "H", "D", "V", "DV", "V", "A", "H", "H", "D", "DV", "D", "C", "H", "DH", "DH", "D", "V", "G", "H", "H", "D", "H", "D", "T", "H", "H", "H", "DH", "DH", "A", "H", "H", "H", "DH", "D", "T", "H", "H", "H", "DH", "H"), .Dim = c(6L, 9L)) Starting at the bottom-right corner, the goal is to follow the directions (D = move diagonally

Filter overlapping entries in bed file

痴心易碎 提交于 2019-12-22 09:50:47
问题 I have a bed file that looks like this: 1 183113 183114 chr1:183113-183240 0 + 1 187286 187287 chr1:187128-187287 0 - 1 187576 187587 chr1:187375-187577 0 - 1 187580 187590 chr1:187379-187577 0 - My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here

Biopython: How to avoid particular amino acid sequences from a protein so as to plot Ramachandran plot?

陌路散爱 提交于 2019-12-22 09:26:05
问题 I have written a python script to plot the 'Ramachandran Plot' of Ubiquitin protein. I am using biopython. I am working with pdb files. My script is as below : import Bio.PDB import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt phi_psi = ([0,0]) phi_psi = np.array(phi_psi) pdb1 ='/home/devanandt/Documents/VMD/1UBQ.pdb' for model in Bio.PDB.PDBParser().get_structure('1UBQ',pdb1) : for chain in model : polypeptides = Bio.PDB.PPBuilder().build_peptides(chain) for poly

Rotate upper triangle of a ggplot tile heatmap

醉酒当歌 提交于 2019-12-22 07:58:11
问题 I've plotted a heat-map like this: ggplot(test, aes(start1, start2)) + geom_tile(aes(fill = logFC), colour = "gray", size=0.05) + scale_fill_gradientn(colours=c("#0000FF","white","#FF0000"), na.value="#DAD7D3") This plots the upper triangle of a heatmap. What i'd like to plot is the very same triangle, but having the hypotenuse as the x-axis . How would I do that? Edit: Added reproducible example library(ggplot2) # dummy data df1 <- mtcars[, c("gear","carb", "mpg")] # normal tile plot gg1 <-

A more complex version of “How can I tell if a string repeats itself in Python?”

徘徊边缘 提交于 2019-12-22 03:45:16
问题 I was reading this post and I wonder if someone can find the way to catch repetitive motifs into a more complex string. For example, find all the repetitive motifs in string = 'AAACACGTACGTAATTCCGTGTGTCCCCTATACGTATACGTTT' Here the repetitive motifs: 'AAAC ACGTACGT AATTCC GTGTGT CCCC TATACGTATACG TTT' So, the output should be something like this: output = {'ACGT': {'repeat': 2, 'region': (5,13)}, 'GT': {'repeat': 3, 'region': (19,24)}, 'TATACG': {'repeat': 2, 'region': (29,40)}} This example

How to convert a set of DNA sequences into protein sequences using python programming?

被刻印的时光 ゝ 提交于 2019-12-21 20:21:55
问题 I am using python to create a program that converts a set of DNA sequences into amino acid (protein) sequences. I then need to find a specific subsequence, and count the number of sequences in which this specific subsequence is present. This is the code I have so far: #Open cDNA_sequences file and read in line by line with open('cDNA_sequences.csv', 'r') as results: for line in results: columns = line.rstrip("\n").split(",") #remove end of line characters and split commas to produce a list

how use matchpattern() to find certain aminoacid in a file with many sequence(.fasta) in R

喜你入骨 提交于 2019-12-21 06:27:14
问题 I have a file (mydata.txt) that contains many exon sequences with fasta format. I want to find start ('atg') and stop ('taa','tga','tag') codons for each DNA sequence (considering the frame). I tried using matchPattern ( a function from the Biostrings R package) to find theses amino acids: As an example mydata.txt could be: >a atgaatgctaaccccaccgagtaa >b atgctaaccactgtcatcaatgcctaa >c atggcatgatgccgagaggccagaataggctaa >d atggtgatagctaacgtatgctag >e atgccatgcgaggagccggctgccattgactag file=read

Merge overlapping numeric ranges into continuous ranges

一世执手 提交于 2019-12-21 02:54:55
问题 I am trying to merge a range of genomic coordinates into continuous ranges, with an additional option for merging across gaps. For example, if I had the genomic ranges [[0, 1000], [5, 1100]] I would want the result to be [0, 1100] . If the offset option was set to 100 , and the input was [[0, 1000], [1090, 1000]] I would once again want the result to be [0, 1100] . I have implemented a way of doing this that steps through the alignments sequentially and tries to merge on the previous ending

Algorithm help! Fast algorithm in searching for a string with its partner

半城伤御伤魂 提交于 2019-12-21 00:57:05
问题 I am looking for a fast algorithm for search purpose in a huge string (it's a organism genome sequence composed of hundreds of millions to billions of chars). There are only 4 chars {A,C,G,T} present in this string, and "A" can only pair with "T" while "C" pairs with "G". Now I am searching for two substrings (with length constraint of both substring between {minLen, maxLen}, and interval length between {intervalMinLen, intervalMaxLen}) that can pair with one another antiparallely. For