bioinformatics | 易学教程

running BLAST (bl2seq) without creating sequence files

阅读更多关于 running BLAST (bl2seq) without creating sequence files

问题 I have a script that performs BLAST queries (bl2seq) The script works like this: Get sequence a, sequence b write sequence a to filea write sequence b to fileb run command 'bl2seq -i filea -j fileb -n blastn' get output from STDOUT, parse repeat 20 million times The program bl2seq does not support piping. Is there any way to do this and avoid writing/reading to the harddrive? I'm using Python BTW. 回答1: How do you know bl2seq does not support piping.? By the way, pipes is an OS feature, not

How to fix 'String index out of range' error

阅读更多关于 How to fix 'String index out of range' error

I am trying to write a code which replaces repeating symbols in a string with a symbol and number of its repeats (like that: "aaaaggggtt" --> "a4g4t2"). But I'm getting string index out of range error(( seq = input() i = 0 j = 1 v = 1 while j<=len(seq)-1: if seq[i] == seq[j]: v += 1 i += 1 j += 1 elif seq[i] != seq[j]: seq.replace(seq[i-v:j], seq[i] + str(v)) v = 1 i += 1 j += 1 print(seq) line 6, in if seq[i] == seq[j]: IndexError: string index out of range UPD: After changing len(seq) to len(seq)-1 there is no more string index error, but the code still doesn't work. Input: aaaaggggtt Output

Extract sample data from VCF files

阅读更多关于 Extract sample data from VCF files

问题 I have a large Variant Call format (VCF) file (> 4GB) which has data for several samples. I have browsed Google, Stackoverflow as well as tried the VariantAnnotation package in R to somehow extract data only for a particular sample, but have not found any information on how to do that in R. Did anybody try anything like that, or maybe knows of another package that would enable this? 回答1: In VariantAnnotation use a ScanVcfParam to specify the data that you'd like to extract. Using the sample

merge two data.frame with condition in R

阅读更多关于 merge two data.frame with condition in R

I would like to compare two data sets df1 and df2 in such a way that, the unique characters in df2$ID should be added as a new column in df1 and assign df2$Xp value for each gene, if the coordinates of df1 overlaps with the coordinates of df2: df1 <- read.table(text=" Gene chr Start End Gm12724 4 1000 1105 Zfhx2 4 1254 1369 Usp17lc 7 5004 5412 Lingo1 7 5698 5789 Sart3 7 5987 6041 Olfr978 4 1452 1564 ", header=T) df2 <- read.table(text=" ID chr Start End Xp S8411 4 989 1258 0.312 S8411 4 1300 1800 0.144 S8411 7 5641 6874 0.136 S8413 4 1307 1360 -1.999 ",header=T) expected output df3 <- read

Python find longest ORF in DNA sequence

阅读更多关于 Python find longest ORF in DNA sequence

问题 Can someone show me a straightforward solution for how to calculate the longest open reading frame (ORF) in a DNA sequence? ATG is the start codon (i.e., the beginning of an ORF) and TAG , TGA , and TAA are stop codons (i.e., the end of an ORF). Here's some code that produces errors (and uses an external module called BioPython): import sys from Bio import SeqIO currentCid = '' buffer = [] for record in SeqIO.parse(open(sys.argv[1]),"fasta"): cid = str(record.description).split('.')[0][1:] if

How to set a for -loop in R

阅读更多关于 How to set a for -loop in R

问题 I am a biologist and have less knowledge of programming. I have series of files(fasta format files) for which I need to apply an R package. each file contents as follows: FILE_1.FASTA >>TTBK2_Hsap ,(CK1/TTBK) MSGGGEQLDILSVGILVKERWKVLRKIGGGGFGEIYDALDMLTRENVALKVESAQQPKQVLKMEVAVLKKLQGKDHVCRFIGCGRNDRFNYVVMQLQGRNLADLRRSQSRGTFT FILE_2.FASTA >>TTBK2_Hsap ,(CK1/TTBK) MSGGGEQLDILSVGILVKERWKVLRKIGGGGFGEIYDALDMLTRENVALKVESAQQPKQVLKMEVAVLKKLQGKDHVCRFIGCGRNDRFNYVVMQLQGRNLADLRRSQSRGTFT and the package

How can I get taxonomic rank names from taxid?

阅读更多关于 How can I get taxonomic rank names from taxid?

问题 This question is related to: How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid? The solution given there works but I would like to have the names for each taxonomic ids for defined ranks. I have found this on ete3 which can do the job: names = ncbi.get_taxid_translator(lineage) print [names[taxid] for taxid in lineage] but not being python programmer, I am failing to incorporate this into the code given in the link above. Here is what I

Find, replace, and increment at each occurence of string

阅读更多关于 Find, replace, and increment at each occurence of string

问题 I'm relatively new to scripting and apologize in advance for this painfully simple problem. I believe I've searched pretty thoroughly, but apparently no other answers or cookbooks have been explicit enough for me to understand (like here - still couldn't get it). I have a file that is made up of strings of letters (DNA, if you care), one string per line. Above each string I've inserted another line to identify the underlying string. For those of you who are bioinformaticians, I'm trying to

Is it possible to install bioconductor package 'rain' in R Jupyter notebook?

阅读更多关于 Is it possible to install bioconductor package 'rain' in R Jupyter notebook?

I want to install the bioconductor rain package for R in Jupyter notebook. I am not able to install this package in Jupyter notebook following instructions given on the website linked above - in an R Jupiter notebook: source("https://bioconductor.org/biocLite.R") biocLite("rain") I get the following error: Warning message: In install.packages(pkgs = doing, lib = lib, ...): installation of package ‘gmp’ had non-zero exit statusWarning message: In install.packages(pkgs = doing, lib = lib, ...): installation of package ‘rain’ had non-zero exit status I was able to install a different bioconductor

Grouping ecological data in R

阅读更多关于 Grouping ecological data in R

问题 I'm looking at some ecological data (diet) and trying to work out how to group by Predator. I would like to be able to extract the data so that I can look at the weights of each individual prey for each species for each predator, i.e work out the mean weight of each species eaten by e.g Predator 117. I've put a sample of my data below. Predator PreySpecies PreyWeight 1 114 10 4.2035496 2 114 10 1.6307026 3 115 1 407.7279775 4 115 1 255.5430495 5 117 10 4.2503708 6 117 10 3.6268814 7 117 10 6