how to pass a function under snakemake run directive

大憨熊 提交于 2019-12-11 07:36:32

问题


I am building a workflow in snakemake and would like to recycle one of the rules to two different input sources. The input sources could be either source1 or source1+source2 and depending on the input the output directory would also vary. Since this was quite complicated to do in the same rule and I didn't want to create the copy of the full rule I would like to create two rules with different input/output, but running same command.

Is it possible to make this work? I get the DAG resolved correctly but the job don't go through on the cluster (ERROR : bamcov_cmd not defined).. An example below (both rules use the same command at the end):

this is command

def bamcov_cmd():
return( (deepTools_path+"bamCoverage " +
            "-b {input.bam} " +
            "-o {output} " +
            "--binSize {params.bw_binsize} " +
            "-p {threads} " +
            "--normalizeTo1x {params.genome_size} " +
            "{params.read_extension} " +
            "&> {log}") )

this is the rule

rule bamCoverage:
input:
    bam = file1+"/{sample}.bam",
    bai = file1+"/{sample}.bam.bai"
output:
    "bamCoverage/{sample}.filter.bw"
params:
    bw_binsize = bw_binsize,
    genome_size = int(genome_size),
    read_extension = "--extendReads"
log:
    "bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
    "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
    bamcov_cmd()

this is the optional rule2

rule bamCoverage2:
input:
    bam = file2+"/{sample}.filter.bam",
    bai = file2+"/{sample}.filter.bam.bai"
output:
    "bamCoverage/{sample}.filter.bw"
params:
    bw_binsize = bw_binsize,
    genome_size = int(genome_size),
    read_extension = "--extendReads"
log:
    "bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
    "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
    bamcov_cmd()

回答1:


What you asked is possible in python. It depends if you have JUST python code in the file, or python and Snakemake. I will answer that first, and then I have a follow up response because I want you to set it up differently so you don't have to do it this way.

Just Python:

 from fileContainingMyBamCovCmdFunction import bamcov_cmd

 rule bamCoverage:
 ...
     run:
         bamcov_cmd()

Visually, see how I do it in this file, to reference access to buildHeader and buildSample. These files are being called by a Snakefile. It should work the same for you. https://github.com/LCR-BCCRC/workflow_exploration/blob/master/Snakemake/modules/py_buildFile/buildFile.py

EDIT 2017-07-23 - Updating code segment below to reflect user comment

Snakemake and Python:

 include: "fileContainingMyBamCovCmdFunction.suffix"

 rule bamCoverage:
 ...
     run:
         shell(bamcov_cmd())

EDIT END

If the function is truly specific to the bamCoverage call, if you prefer you can put it back in the rule. This implies it's not being called elsewhere, which may be true. Be careful when annotating files using '.' notation, I use '_' as I find it's easier to prevent creating cyclical dependencies this way. Also, if you do end up leaving the two rules separately, you will likely end up with ambiguity errors. http://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=ruleorder#handling-ambiguous-rules When possible, it's best practice to have rules generating unique outputs.

As for alternatives, consider setting up the code like this?

from subprocess import call


rule all:
    input:
          "path/to/file/mySample.bw"
          #OR
          #"path/to/file/mySample_filtered.bw"


bamCoverage:
input:
    bam = file1+"/{sample}.bam",
    bai = file1+"/{sample}.bam.bai"
output:
    "bamCoverage/{sample}.bw"
params:
    bw_binsize = bw_binsize,
    genome_size = int(genome_size),
    read_extension = "--extendReads"
log:
    "bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
    "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
    callString= deepTools_path + "bamCoverage " \
            + "-b " + wilcards.input.bam \
            + "-o " + wilcards.output \
            + "--binSize " str(params.bw_binsize) \
            + "-p " + str({threads}) \
            + "--normalizeTo1x " + str(params.genome_size) \
            + " " + str(params.read_extension) \
            + "&> " + str(log)
    call(callString, shell=True)


rule filterBam:
input:
    "{pathFB}/{sample}.bam"
output:
    "{pathFB}/{sample}_filtered.bam"
run:
    callString="samtools view -bh -F 512 " + wildcards.input \
    + ' > ' + wildcards.output

    call(callString, shell=True)

Thoughts?



来源:https://stackoverflow.com/questions/45245966/how-to-pass-a-function-under-snakemake-run-directive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!