bioinformatics

Getting intersection of two lists in python

北城余情 提交于 2019-12-12 01:55:42
问题 I have two lists of genes that i'm analyzing. Essentially I want to sort the elements of these lists much in the same way as a Venn diagram, i.e. elements that only occur in list 1 are placed in one list, those only in list 2 are in another and those occurring in both are in a third. My code so far: from Identify_Gene import Retrieve_Data #custom class import argparse import os #enable use from command line parser = argparse.ArgumentParser(description='''\n\nFind the intersection between two

Grep that tolerates mismatches to subset .fastq

心不动则不痛 提交于 2019-12-12 01:27:21
问题 I am working with bash on a linux cluster. I am trying to extract reads from a .fastq file if they contain a match to a queried sequence. Below is an example .fastq file containing three reads. $ cat example.fastq @SRR1111111.1 1/1 CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6 @SRR1111111.2 2/1 CTATANTATTCTATATTTATTCTAGATAAAAGCATTCTATATTTAGCATATGTCTAGCAAAAAAAA + AAAAA#EE6EEEEEEEEEEEEAAEEAEEEEEEEEEEEE/EAE

How to calculate bond angle in protein db file? [closed]

霸气de小男生 提交于 2019-12-12 00:49:41
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 6 years ago . If I have three x,y,z coordinates for protein backbone (N-Ca-C-N-Ca-C....) as such: N -14.152 0.961 4.712 CA -13.296 0.028 3.924 C -11.822 0.338 4.193 N -11.121 -0.642 4.703 CA -9.669 -0.447 4.998 C -8.861 -1.586 4.373 how can I calculate the "bond angles" (Ni-Cai-Ci, Cai-Ci-Ni+1, Ci-Ni+1-CAi+1)?

Reading at three different frames

若如初见. 提交于 2019-12-11 19:27:27
问题 So I'm trying to create a class that reads a DNA string in three different frames - one that starts at position 0 (or the first base), another that starts in position 1 (the second base), and a third that starts reading at position 2 (the third base). So far, this is what I've been playing around with: def codons(self, frame_one, frame_two, frame_three): start = frame_one while start + 3 <=len(self.seq): yield (self.seq[start:start+3], start) start += 3 start+1 = frame_two while start + 3 <

High-level data matching between 2 tables

岁酱吖の 提交于 2019-12-11 19:23:02
问题 I'm new to R and I need advice on dealing with this problem: I have 2 tables. The start of the tables are shown below: Table 1: SNP Gene Pval Best_SNP Best_Pval rs2932538 ENSG00000007341 5.6007 rs10488631 ENSG00000064419 7.7461 rs12537284 ENSG00000064419 4.5544 rs3764650 ENSG00000064666 12.3401 rs10479002 ENSG00000072682 5.0141 rs6704644 ENSG00000072682 6.2306 rs2900211 ENSG00000072682 9.9022 Table 2: Best_SNP Gene Best_Pval rs9028922 ENSG00000007341 10.7892 rs8233293 ENSG00000064666 89.342

comparing varied CSV files in python

本小妞迷上赌 提交于 2019-12-11 19:21:50
问题 Suppose I have 2 CSV files: file 1: Epitope Name,Epitope,Protein,position,position 3606,NSRSTSLSV,FOO,10,21 File 2: A,B,C,D,E,F,G,H,I,J,K 0,1,2,3,4,5,6,7,8,9,NSRSTSLSV Essentially, I want to see if the contents of row 1 in file 1 are found in row 10 of file 2. If the contents match, I'll print a 3rd csv that is a new version of file 1 with a column saying found or not found. Right now, I'm getting not found for everything, which I know not to be the case. In some cases, the text from file 1

Plotting coverage depth in 1kb windows?

旧城冷巷雨未停 提交于 2019-12-11 17:57:54
问题 I would like to plot average coverage depth across my genome, with chromosomes lined in increasing order. I have calculated coverage depth per position for my genome using samtools. I would like to generate a plot (which uses 1kb windows) like Figure 7: http://www.g3journal.org/content/ggg/6/8/2421/F7.large.jpg?width=800&height=600&carousel=1 Example dataframe: Chr locus depth chr1 1 20 chr1 2 24 chr1 3 26 chr2 1 53 chr2 2 71 chr2 3 74 chr3 1 29 chr3 2 36 chr3 3 39 Do I need to change the

python bokeh legend out of the plot size

有些话、适合烂在心里 提交于 2019-12-11 17:47:35
问题 I'm new in python, and somebody help me with this code, but I want to change some parameter: First the size of the legend out of the plot, some time the legend are to big (example: D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Peptostreptococcaceae;D_5__Acetoanaerobium), and some time it is short (Acetoanaerobium), so I just want to make the legend auto fix the size (in the figure the legend are not complete)!!!. Second, the label that appear when the pointer get hover

Single linkage clustering of edit distance matrix with distance threshold stopping criterion

杀马特。学长 韩版系。学妹 提交于 2019-12-11 17:26:53
问题 I'm trying to assign flat, single-linkage clusters to sequence IDs separated by an edit distance < n, given a square distance matrix. I believe scipy.cluster.hierarchy.fclusterdata() with criterion='distance' may be a way to do this, but it isn't quite returning the clusters I'd expect for this toy example. Specifically, in the 4x4 distance matrix example below, I would expect clusters_50 (which uses t=50 ) to create 2 clusters, where actually it finds 3. I think the issue is that

Error with t-test

余生长醉 提交于 2019-12-11 16:44:55
问题 I'm having errors with the normal t-test: data <- read.table("/Users/vdas/Documents/RNA-Seq_Smaples_Udine_08032013/GBM_29052013/UD_RP_25072013/filteredFPKM_matrix.txt",sep="",header=TRUE,stringsAsFactors=FALSE) PGT <- cbind(data[,2],data[,7],data[,24]) PDGT <- cbind(data[,6],data[,8]) pval2 <- NULL for(i in 1:length(PGT[,1])){ pval2 <- c(pval2,t.test(as.numeric(PDGT[i,]),as.numeric(PGT[i,]))$p.value) print(i) } Error: Error in t.test.default(as.numeric(PDGT[i, ]), as.numeric(PGT[i, ])) : not