问题
I have two locations where my huge data can be stored: /data and /work.
/data is the folder where (intermediate) results are moved to after quality control. It is mounted read-only for the standard user.
/work is the folder where new results are written to. Obviously, it is writable.
I do not want to copy or link data from /data to /work.
So I run my snakemake from within the /work folder and want my input function first to check, if the required file exists in /data (and return the absolute /data path) and if not return the relative path in the /work directory.
def in_func(wildcards):
file_path = apply_wildcards('{id}/{visit}/{id}_{visit}-file_name_1.txt', wildcards)
full_storage_path = os.path.join('/data', file_path)
if os.path.isfile(full_storage_path):
file_path = full_storage_path
return {'myfile': file_path}
rule do_something:
input:
unpack(in_func),
params = '{id}/{visit}/{id}_{visit}_params.txt',
This works fine but I would have to define separate input functions for every rule because the file names differ. Is is possible to write a generic input function that takes as input the file name e.g {id}/{visit}/{id}_{visit}-file_name_1.txt and the wildcards?
I also tried something like
def in_func(file_path):
full_storage_path = os.path.join('/data', file_path)
if os.path.isfile(full_storage_path):
file_path = full_storage_path
file_path
rule do_something:
input:
myfile = in_func('{id}/{visit}/{id}_{visit}-file_name_1.txt')
params = '{id}/{visit}/{id}_{visit}_params.txt',
But then I do not have access to the wildcards in in_func(), do I?
Thanks, Jan
回答1:
You could use something like this:
def handle_storage(pattern):
def handle_wildcards(wildcards):
f = pattern.format(**wildcards)
f_data = os.path.join("/data", f)
if os.path.exists(f_data):
return f_data
return f
return handle_wildcards
rule do_something:
input:
myfile = handle_storage('{id}/{visit}/{id}_{visit}-file_name_1.txt')
params = '{id}/{visit}/{id}_{visit}_params.txt',
In other words, the function handle_storage returns a pointer to the handle_wildcards function that is tailored for the particular pattern. The latter is then automatically applied by Snakemake once the wildcard values are known. Inside that function, we first format the pattern and then check if it exists in /data.
来源:https://stackoverflow.com/questions/45728159/snakemake-generic-input-function-for-different-file-locations