问题
Is it possible to define default settings for memory and resources in cluster config file, and then override in rule specific manner, when needed? Is resources
field in rules directly tied to cluster config file? Or is it just a fancy way for params
field for readability purposes?
In the example below, how do I use default cluster configs for rule a
, but use custom changes (memory=40000
and rusage=15000
) in rule b
?
cluster.json:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
Snakefile:
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
Command for execution:
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
I understand that it is possible to define rule specific resources requirements in cluster config file, but I would prefer to define them directly in Snakefile, if possible.
Or else, if there is a better way of implementing this, please let me know.
回答1:
You can directly add resources
to each of your rules :
rule all:
'a_out.txt' , 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
resources:
mem_mb=40000
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
resources:
mem_mb=20000
shell:
'touch {output}'
And then, you should remove the resources
parameter from your .json
, so that the command line would not override the snakefile:
new.cluster.json:
{
"__default__":
{
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
回答2:
In new.cluster.json
you can actually define resources for specific rules. So in your case you would do the following
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
"b":
{
"memory": 40000,
"resources": "\"rusage[mem=15000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
Then in the Snakefile
you can refer to these resources by importing new.cluster.json
and referring to it in your rule
import json
with open('new.cluster.json') as fh:
cluster_config = json.load(fh)
rule all:
'a_out.txt' , 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
resources:
mem_mb=cluster_config["b"]["memory"]
shell:
'touch {output}'
If you take a look through this repository, you can see how I use these cluster configs in the wild.
来源:https://stackoverflow.com/questions/47600510/snakemake-override-lsf-bsub-cluster-config-in-a-rule-specific-manner