bioinformatics | 易学教程

Snakemake: Error when trying to generate multiple output files

阅读更多关于 Snakemake: Error when trying to generate multiple output files

问题 I'm writing a snakemake pipeline to take publicly available sra files, convert them to fastq files then run them through alignment, peak calling and LD score regression. I'm having an issue in the rule called SRA2fastq below in which I use parallel-fastq-dump to convert SRA files to paired end fastq files. This rule generates two outputs for each SRA file, SRRXXXXXXX_1 , and SRRXXXXXXX_2 . Here is my config file: samples: fullard2018_NpfcATAC_1: SRR5367824 fullard2018_NpfcATAC_2: SRR5367798

Comparing one column value to all columns in linux enviroment

阅读更多关于 Comparing one column value to all columns in linux enviroment

问题 So I have two files , one VCF that looks like 88 Chr1 25 C - 3 2 1 1 88 Chr1 88 A T 7 2 1 1 88 Chr1 92 A C 16 4 1 1 and another with genes that looks like GENEID Start END GENE_ID 11 155 GENE_ID 165 999 I want a script that looks if there is a gene position (3rd column of VCF file) within the range of second and third position of the second file and then to print it out. What I did so far was to join the files and do awk '{if (3>$12 && $3< $13) print }' > out What I did only compares current

(BioPython) How do I stop MemoryError: Out of Memory exception?

阅读更多关于 (BioPython) How do I stop MemoryError: Out of Memory exception?

问题 I have a program where I take a pair of very large multiple sequence files (>77,000 sequences each averaging about 1000 bp long) and calculate the alignment score between each paired individual element and write that number into an output file (which I will load into an excel file later). My code works for small multiple sequence files but my large master file will throw the following traceback after analyzing the 16th pair. Traceback (most recent call last): File "C:\Users\Harry\Documents

Unable to parse just sequences from FASTA file

阅读更多关于 Unable to parse just sequences from FASTA file

问题 How can I remove ids like '>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA\n' from sequences? I have this code: with open('sequence.fasta', 'r') as f : while True: line1=f.readline() line2=f.readline() line3=f.readline() if not line3: break fct([line1[i:i+100] for i in range(0, len(line1), 100)]) fct([line2[i:i+100] for i in range(0, len(line2), 100)]) fct([line3[i:i+100] for i in range(0, len(line3), 100)]) Output: ['>gi|2765658|emb|Z78533.1|CIZ78533 C

merge two data.frame with condition in R

阅读更多关于 merge two data.frame with condition in R

问题 I would like to compare two data sets df1 and df2 in such a way that, the unique characters in df2$ID should be added as a new column in df1 and assign df2$Xp value for each gene, if the coordinates of df1 overlaps with the coordinates of df2: df1 <- read.table(text=" Gene chr Start End Gm12724 4 1000 1105 Zfhx2 4 1254 1369 Usp17lc 7 5004 5412 Lingo1 7 5698 5789 Sart3 7 5987 6041 Olfr978 4 1452 1564 ", header=T) df2 <- read.table(text=" ID chr Start End Xp S8411 4 989 1258 0.312 S8411 4 1300

Only call function if PyMOL running

阅读更多关于 Only call function if PyMOL running

问题 I have a script that performs some calculations on a protein. When it's finished, a method imports the pymol module, and uses the pymol.cmd API to display results in a PyMOL session. The process is something akin to the following: def display_results(results, protein_fn): import pymol pymol.cmd.load(protein_fn) pymol.cmd.alter(...) ... protein_fn = "1abc.ent" results = analyze_protein(protein_fn) display_results(results, protein_fn) However, my script doesn't necessarily need to display the

How to fix 'String index out of range' error

阅读更多关于 How to fix 'String index out of range' error

问题 I am trying to write a code which replaces repeating symbols in a string with a symbol and number of its repeats (like that: "aaaaggggtt" --> "a4g4t2"). But I'm getting string index out of range error(( seq = input() i = 0 j = 1 v = 1 while j<=len(seq)-1: if seq[i] == seq[j]: v += 1 i += 1 j += 1 elif seq[i] != seq[j]: seq.replace(seq[i-v:j], seq[i] + str(v)) v = 1 i += 1 j += 1 print(seq) line 6, in if seq[i] == seq[j]: IndexError: string index out of range UPD: After changing len(seq) to

Is it possible to install bioconductor package 'rain' in R Jupyter notebook?

阅读更多关于 Is it possible to install bioconductor package 'rain' in R Jupyter notebook?

问题 I want to install the bioconductor rain package for R in Jupyter notebook. I am not able to install this package in Jupyter notebook following instructions given on the website linked above - in an R Jupiter notebook: source("https://bioconductor.org/biocLite.R") biocLite("rain") I get the following error: Warning message: In install.packages(pkgs = doing, lib = lib, ...): installation of package ‘gmp’ had non-zero exit statusWarning message: In install.packages(pkgs = doing, lib = lib, ...):

R: How to change the column names in a data frame based on a specification

阅读更多关于 R: How to change the column names in a data frame based on a specification

问题 I have a data frame, the start of it is below: SM_H1455 SM_V1456 SM_K1457 SM_X1461 SM_K1462 ENSG00000000419.8 290 270 314 364 240 ENSG00000000457.8 252 230 242 220 106 ENSG00000000460.11 154 158 162 136 64 ENSG00000000938.7 20106 18664 19764 15640 19024 ENSG00000000971.11 30 10 4 2 10 Note that there are many more cols and rows. Here's what I want to do: I want to change the name of the columns. The most important information in a column's name, e.g. SM_H1455, is the 4th character of the

How do I decide which way to backtrack in the Smith–Waterman algorithm?

阅读更多关于 How do I decide which way to backtrack in the Smith–Waterman algorithm?

问题 I am trying to implement local sequence alignment in Python using the Smith–Waterman algorithm. Here's what I have so far. It gets as far as building the similarity matrix: import sys, string from numpy import * f1=open(sys.argv[1], 'r') seq1=f1.readline() f1.close() seq1=string.strip(seq1) f2=open(sys.argv[2], 'r') seq2=f2.readline() f2.close() seq2=string.strip(seq2) a,b =len(seq1),len(seq2) penalty=-1; point=2; #generation of matrix for local alignment p=zeros((a+1,b+1)) # table