bioinformatics

Examples for Topological Sorting on Large DAGs

有些话、适合烂在心里 提交于 2019-12-20 18:44:21
问题 I am looking for real world applications where topological sorting is performed on large graph sizes. Some fields where I image you could find such instances would be bioinformatics, dependency resolution, databases, hardware design, data warehousing... but I hope some of you may have encountered or heard of any specific algorithms/projects/applications/datasets that require topsort. Even if the data/project may not be publicly accessible any hints (and estimates on the order of magnitude of

Subset a file by row and column numbers

眉间皱痕 提交于 2019-12-20 09:53:20
问题 We want to subset a text file on rows and columns, where rows and columns numbers are read from a file. Excluding header (row 1) and rownames (col 1). inputFile.txt Tab delimited text file header 62 9 3 54 6 1 25 1 2 3 4 5 6 96 1 1 1 1 0 1 72 3 3 3 3 3 3 18 0 1 0 1 1 0 82 1 0 0 0 0 1 77 1 0 1 0 1 1 15 7 7 7 7 7 7 82 0 0 1 1 1 0 37 0 1 0 0 1 0 18 0 1 0 0 1 0 53 0 0 1 0 0 0 57 1 1 1 1 1 1 subsetCols.txt Comma separated with no spaces, one row, numbers ordered. In real data we have 500K columns,

Snakemake: unknown output/input files after splitting by chromosome

穿精又带淫゛_ 提交于 2019-12-20 03:36:32
问题 To speed up a certain snakemake step I would like to: split my bamfile per chromosome using bamtools split -in sample.bam --reference this results in files named as sample.REF_{chromosome}.bam perform variant calling on each resulting in e.g. sample.REF_{chromosome}.vcf recombine the obtained vcf files using vcf-concat (VCFtools) using vcf-concat file1.vcf file2.vcf file3.vcf > sample.vcf The problem is that I don't know a priori which chromosomes may be in my bam file. So I cannot specify

Bash: replace part of filename

风格不统一 提交于 2019-12-20 03:25:30
问题 I have a command I want to run on all of the files of a folder, and the command's syntax looks like this: tophat -o <output_file> <input_file> What I would like to do is a script that loops over all the files in an arbitrary folder and also uses the input file names to create similar, but different, output file names. The file names looks like this: input name desired output name path/to/sample1.fastq path/to/sample1.bam path/to/sample2.fastq path/to/sample2.bam Getting the input to work

R Bioconductor installation error - Line starting '< DOCTYPE html PUBLI …' is malformed

时光总嘲笑我的痴心妄想 提交于 2019-12-19 10:48:06
问题 I'm having trouble installing bioconductor packages in R. This is on MacOSX, a fresh install of R 2.15, and using bioconductor 1.4.4. Transcript follows: > source("http://bioconductor.org/biocLite.R") BiocInstaller version 1.4.4, ?biocLite for help > biocLite("Biobase") BioC_mirror: http://bioconductor.org Using R version 2.15, BiocInstaller version 1.4.4. Warning: unable to access index for repository http://brainarray.mbni.med.umich.edu/bioc/bin/macosx/leopard/contrib/2.15 Installing

SMILES from graph

你离开我真会死。 提交于 2019-12-18 04:54:14
问题 Is there a method or package that converts a graph (or adjacency matrix) into a SMILES string? For instance, I know the atoms are [6 6 7 6 6 6 6 8] ([C C N C C C C O]) , and the adjacency matrix is [[ 0., 1., 0., 0., 0., 0., 0., 0.], [ 1., 0., 2., 0., 0., 0., 0., 1.], [ 0., 2., 0., 1., 0., 0., 0., 0.], [ 0., 0., 1., 0., 1., 0., 0., 0.], [ 0., 0., 0., 1., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0., 1., 1.], [ 0., 0., 0., 0., 0., 1., 0., 0.], [ 0., 1., 0., 0., 0., 1., 0., 0.]] I need some

Split a column to multiple columns

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-17 20:56:34
问题 I have table that the first column is: chr10:100002872-100002872 chr10:100003981-100003981 chr10:100004774-100004774 chr10:100005285-100005285 chr10:100007123-100007123 I want to convert it to 3 separate columns but I couldn't define ":" and "-" to used strsplit command. What should I do? 回答1: Here's one way: library(data.table) DF[, paste0("V1.",1:3) ] <- tstrsplit(DF$V1, ":|-") # V1 V1.1 V1.2 V1.3 # 1 chr10:100002872-100002872 chr10 100002872 100002872 # 2 chr10:100003981-100003981 chr10

Processing the input file based on range overlap

亡梦爱人 提交于 2019-12-17 20:16:18
问题 I have a huge input file (a representative sample of which is shown below as input ): > input CT1 CT2 CT3 1 chr1:200-400 chr1:250-450 chr1:400-800 2 chr1:800-970 chr2:200-500 chr1:700-870 3 chr2:300-700 chr2:600-1000 chr2:700-1400 I want to process it by following some rules (described below) so that I get an output like: > output CT1 CT2 CT3 chr1:200-400 1 1 0 chr1:800-970 1 0 0 chr2:300-700 1 1 0 chr1:250-450 1 1 0 chr2:200-500 1 1 0 chr2:600-1000 0 1 1 chr1:400-800 0 0 1 chr1:700-870 0 1 1

Find overlapping regions and extract respective value

时光毁灭记忆、已成空白 提交于 2019-12-17 16:59:23
问题 How do you find the overlapping coordinates and extract the respective seg.mean values for the overlapping region? data1 Rl pValue chr start end CNA 2 2.594433 6 129740000 129780000 gain 2 3.941399 6 130080000 130380000 gain 1 1.992114 10 80900000 81100000 gain 1 7.175750 16 44780000 44920000 gain data2 ID chrom loc.start loc.end num.mark seg.mean 8410 6 129750000 129760000 8430 0.0039 8410 10 80907000 81000000 5 -1.7738 8410 16 44790000 44910000 12 0.0110 dataoutput Rl pValue chr start end

Using the reserved word “class” as field name in Django and Django REST Framework

社会主义新天地 提交于 2019-12-17 16:35:44
问题 Description of the problem Taxonomy is the science of defining and naming groups of biological organisms on the basis of shared characteristics. Organisms are grouped together into taxa (singular: taxon) and these groups are given a taxonomic rank. The principal ranks in modern use are domain, kingdom, phylum, class, order, family, genus and species. More information on Taxonomy and Taxonomic ranks in Wikipedia. Following the example for the red fox in the article Taxonomic rank in Wikipedia