snakemake

How to select all files from one sample?

牧云@^-^@ 提交于 2019-12-02 05:48:59
问题 I have a problem figuring out how to make the input directive only select all {samples} files in the rule below. rule MarkDup: input: expand("Outputs/MergeBamAlignment/{samples}_{lanes}_{flowcells}.merged.bam", zip, samples=samples['sample'], lanes=samples['lane'], flowcells=samples['flowcell']), output: bam = "Outputs/MarkDuplicates/{samples}_markedDuplicates.bam", metrics = "Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics", shell: "gatk --java-options -Djava.io.tempdir=`pwd`/tmp \

How to use list in Snakemake Tabular configuration, for describing of sequencing units for bioinformatic pipeline

橙三吉。 提交于 2019-12-02 04:29:00
问题 How to use a list in Snakemake tabular config. I use Snakemake Tabular (mapping with BWA mem) configuration to describe my sequencing units (libraries sequenced on separate lines). At the next stage of analysis I have to merge sequencing units (mapped .bed files) and take merged .bam files (one for each sample). Now I'm using YAML config for describing of what units belong to what samples. But I wish to use Tabular config for this purpose, I'm not clear how to write and recall a list

Thread.py error snakemake

旧巷老猫 提交于 2019-12-02 03:24:25
I am trying to run a simple one-rule snakemake file as following: resources_dir='resources' rule downloadReference: output: fa = resources_dir+'/human_g1k_v37.fasta', fai = resources_dir+'/human_g1k_v37.fasta.fai', shell: ('mkdir -p '+resources_dir+'; cd '+resources_dir+'; ' + 'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; ' + 'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai;') But I get an error as : Error in job downloadReference while creating output files resources

Thread.py error snakemake

和自甴很熟 提交于 2019-12-02 01:59:40
问题 I am trying to run a simple one-rule snakemake file as following: resources_dir='resources' rule downloadReference: output: fa = resources_dir+'/human_g1k_v37.fasta', fai = resources_dir+'/human_g1k_v37.fasta.fai', shell: ('mkdir -p '+resources_dir+'; cd '+resources_dir+'; ' + 'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; ' + 'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37

What would be an elegant way of preventing snakemake from failing upon shell/R error?

天涯浪子 提交于 2019-12-02 01:20:41
I would like to be able to have my snakemake workflows continue running even when certain rules fail. For example, I'm using a variety of tools in order to perform peak-calling of ChIP-seq data. However, certain programs issue an error when they are not able to identify peaks. I would prefer to create an empty output file in such cases, and not having snakemake fail (like some peak-callers already do). Is there a snakemake-like way of handling such cases, using the "shell" and "run" keywords? Thanks For shell commands, you can always take advantage conditional "or", || : rule some_rule: output

How to select all files from one sample?

对着背影说爱祢 提交于 2019-12-02 00:22:43
I have a problem figuring out how to make the input directive only select all {samples} files in the rule below. rule MarkDup: input: expand("Outputs/MergeBamAlignment/{samples}_{lanes}_{flowcells}.merged.bam", zip, samples=samples['sample'], lanes=samples['lane'], flowcells=samples['flowcell']), output: bam = "Outputs/MarkDuplicates/{samples}_markedDuplicates.bam", metrics = "Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics", shell: "gatk --java-options -Djava.io.tempdir=`pwd`/tmp \ MarkDuplicates \ $(echo ' {input}' | sed 's/ / --INPUT /g') \ -O {output.bam} \ --VALIDATION

Parallelizing snakemake rule

﹥>﹥吖頭↗ 提交于 2019-12-01 20:45:34
Sorry if this is a naive question, but I'm still trying to wrap my head around the intricacies of Snakemake. I have a directory containing a number of files that I want to apply a rule to in parallel (i.e. I want to submit the same script to the cluster, specifying a different input file for each submission). I first tried using expand for the input files, but this only resulted in one job submission: CHROMS = [str(c) for c in range(1, 23)] + ["X"] rule vep: input: expand("data/split/chr{chrom}.vcf", chrom=CHROMS) output: expand("data/vep/split/chr{chrom}.ann.vcf", chrom=CHROMS) shell: "vep "

Snakemake: How to save and access sample details in config.yml file?

佐手、 提交于 2019-12-01 11:24:35
Can anybody help me understand if it is possible to access sample details from a config.yml file when the sample names are not written in the snakemake workflow? This is so I can re-use the workflow for different projects and only adjust the config file. Let me give you an example: I have four samples that belong together and should be analyzed together. They are called sample1-4. Every sample comes with some more information but to keep it simple here lets say its just a name tag such as S1, S2, etc. My config.yml file could look like this: samples: ["sample1","sample2","sample3","sample4"]

How to do a partial expand in Snakemake?

£可爱£侵袭症+ 提交于 2019-12-01 08:07:08
I'm trying to first generate 4 files, for the LETTERS x NUMS combinations, then summarize over the NUMS to obtain one file per element in LETTERS: LETTERS = ["A", "B"] NUMS = ["1", "2"] rule all: input: expand("combined_{letter}.txt", letter=LETTERS) rule generate_text: output: "text_{letter}_{num}.txt" shell: """ echo "test" > {output} """ rule combine text: input: expand("text_{letter}_{num}.txt", num=NUMS) output: "combined_{letter}.txt" shell: """ cat {input} > {output} """ Executing this snakefile results in the following error: WildcardError in line 19 of /tmp/Snakefile: No values given

Handling SIGPIPE error in snakemake

别来无恙 提交于 2019-12-01 05:22:18
问题 The following snakemake script: rule all: input: 'test.done' rule pipe: output: 'test.done' shell: """ seq 1 10000 | head > test.done """ fails with the following error: snakemake -s test.snake Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 pipe 2 rule pipe: output: test.done jobid: 1 Error in job pipe while creating output file test.done. RuleException: CalledProcessError in line 9 of /Users/db291g/Tritume/test.snake: Command ' seq 1 10000 |