snakemake | 易学教程

Use of Snakemake workflows in AWS Batch

阅读更多关于 Use of Snakemake workflows in AWS Batch

问题 I wanted to ask the Snakemake community if anybody has had success implementing Snakemake workflows in AWS Batch. Page 4 of a recent publication from Oct 2018 seems to suggest that Snakemake does not work on AWS as it cannot handle resource management. Here is the publication: Tibanna: software for scalable execution of portable pipelines on the cloud - https://www.biorxiv.org/content/early/2019/04/29/440974.full.pdf Yes, the same paper does suggest the Snakemake works well with the Google

snakemake: how to implement log directive when using run directive?

阅读更多关于 snakemake: how to implement log directive when using run directive?

问题 Snakemake allows creation of a log for each rule with log parameter that specifies the name of the log file. It is relatively straightforward to pipe results from shell output to this log, but I am not able to figure out a way of logging output of run output (i.e. python script). One workaround is to save the python code in a script and then run it from the shell, but I wonder if there is another way? 回答1: I have some rules that use both the log and run directives. In the run directive, I

Snakemake: unknown output/input files after splitting by chromosome

阅读更多关于 Snakemake: unknown output/input files after splitting by chromosome

问题 To speed up a certain snakemake step I would like to: split my bamfile per chromosome using bamtools split -in sample.bam --reference this results in files named as sample.REF_{chromosome}.bam perform variant calling on each resulting in e.g. sample.REF_{chromosome}.vcf recombine the obtained vcf files using vcf-concat (VCFtools) using vcf-concat file1.vcf file2.vcf file3.vcf > sample.vcf The problem is that I don't know a priori which chromosomes may be in my bam file. So I cannot specify

What would be an elegant way of preventing snakemake from failing upon shell/R error?

阅读更多关于 What would be an elegant way of preventing snakemake from failing upon shell/R error?

问题 I would like to be able to have my snakemake workflows continue running even when certain rules fail. For example, I'm using a variety of tools in order to perform peak-calling of ChIP-seq data. However, certain programs issue an error when they are not able to identify peaks. I would prefer to create an empty output file in such cases, and not having snakemake fail (like some peak-callers already do). Is there a snakemake-like way of handling such cases, using the "shell" and "run" keywords?

Are files defined in the log section of a snakemake rule much different from the ones defined in the output section?

阅读更多关于 Are files defined in the log section of a snakemake rule much different from the ones defined in the output section?

问题 As I understand the documentation for the log section of a snakemake rule, one has to "manually" send things to the log files. It seems to me that one could achieve the same results using files defined in the output section. What are the important differences between these two possible approaches? What is the real usefulness of the log section? 回答1: For me the best pratice for log is Snakemake is like that : rule example1: input: file = <input> log: out = '_stdout.log', err = '_stderr.err'

Including unforeseen file names as wildcards in Snakemake

阅读更多关于 Including unforeseen file names as wildcards in Snakemake

问题 gdc-fastq-splitter splits FASTQ files into read groups. For instance, should 3 different read groups be included in dummy.fq.gz, three fastq files will be generated: dummy_readgroup_1.fq.gz, dummy_readgroup_2.fq.gz, dummy_readgroup_3.fq.gz. Given that each original FASTQ file is in a different folder and contains a different number of read groups, the resulting files cannot be easily inputted in the following step as wildcards. Taking into account that I do not know the exact name and number

Output log file to cluster option

阅读更多关于 Output log file to cluster option

问题 I'm submitting jobs to slurm/sbatch via snakemake . I'm trying to send the log from sbatch to a file in the same directory tree of the rule's output. For example, this works: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output log.txt" but it fails ( i.e. slurm job status is FAILED) if I try: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output {output}.log" presumably, because {output} points to foo/bar/ which does not exist. But snakemake should have created

Snakemake using a rule in a loop

阅读更多关于 Snakemake using a rule in a loop

问题 I'm trying to use Snakemake rules within a loop so that the rule takes the output of the previous iteration as input. Is that possible and if yes how can I do that? Here is my example Setup the test data mkdir -p test echo "SampleA" > test/SampleA.txt echo "SampleB" > test/SampleB.txt Snakemake SAMPLES = ["SampleA", "SampleB"] rule all: input: # Output of the final loop expand("loop3/{sample}.txt", sample = SAMPLES) #### LOOP #### for i in list(range(1, 4)): # Setup prefix for input if i == 1

Snakemake - Override LSF (bsub) cluster config in a rule-specific manner

阅读更多关于 Snakemake - Override LSF (bsub) cluster config in a rule-specific manner

问题 Is it possible to define default settings for memory and resources in cluster config file, and then override in rule specific manner, when needed? Is resources field in rules directly tied to cluster config file? Or is it just a fancy way for params field for readability purposes? In the example below, how do I use default cluster configs for rule a , but use custom changes ( memory=40000 and rusage=15000 ) in rule b ? cluster.json: { "__default__": { "memory": 20000, "resources": "\"rusage

split bam files to (variable) pre-defined number of small bam files depending on the sample

阅读更多关于 split bam files to (variable) pre-defined number of small bam files depending on the sample

问题 I want to split multiple bam files to pre-determined number of smaller bam files. I do not know how to specify the output because the number of smaller bam files is variable depending on which samples I am splitting. I have read https://bitbucket.org/snakemake/snakemake/issues/865/pre-determined-dynamic-output I do not see how checkpoint is helping me in my case. SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" : [ "1" ], "SampleC" : [ "1", "2" ] } rule split_bam: input: "{sample}