问题
I am trying to create a rule to implement bedtools
in snakemake
, which will closest
a file with bunch of files in another directory.
What I have is, under /home/bedfiles
directory, 20 bed files:
1A.bed , 2B_83.bed , 3f_33.bed ...
What I want is, under /home/bedfiles
directory, 20 modified bed files:
1A_modified, 2B_83_modified , 3f_33_modified ...
So the bash command would be :
filelist='/home/bedfiles/*.bed'
for mfile in $filelist;
do
bedtools closest -a /home/other/merged.txt -b ${mfile} > ${mfile}_modified
So this command would make files with _modified
extension, in /home/bedfiles
directory.
I want to implement this with Snakemake
, however I keep having a syntax error, that I have no idea of how to fix. My trial is:
Step1:Getting the first part of bed files in the directory
FIRSTPART = [f.split(".")[0] for f in os.listdir("/home/bedfiles") if f.endswith('.bed')]
Step2: Defining the output name and folder
MODIFIED = expand("/home/bedfiles/{first}_modified", first=FIRSTPART)
Step3: Writing this in rule all
:
rule all:
input: MODIFIED
Step4: Making a specific rule to implement 'bedtools closest'
rule closest:
input:
input1 = "/home/other/merged.txt" , \
input2 = expand("/home/bedfiles/{first}.bed", first=FIRSTPART)
output:
expand("/home/bedfiles/{first}_modified", first=FIRSTPART)
shell:
""" bedtools closest -a {input.input1} -b {input.input2} > {output} """
And it throws me the error at the line for rule all,input:
invalid syntax
Do you know how to overpass this error or any other way to implement it?
PS : Writing the names of the files one by one is not possible.
回答1:
Remove the call to expand
in your definition of input
and output
in closest
. You're currently passing in a vector of 20 filenames as input.input2
and a vector of 20 filenames as output
.
That is, your rule closest
is currently trying to run once and create 20 files; whereas it should run 20 times and create a single file each time.
In closest
you want input.input2
to be a single file and output
to be a single file each time that rule is ran:
FIRSTPART = [f.split(".")[0] for f in os.listdir("/home/bedfiles") if f.endswith('.bed')]
print("These are the input files:")
print([f + ".bed" for f in FIRSTPART])
MODIFIED = expand("/home/bedfiles/{first}_modified", first=FIRSTPART)
print("These will be created")
print(MODIFIED)
rule all:
input: MODIFIED
rule closest:
message: """
Converts /home/other/merged.txt and /some/dir/xyz.bed
into /some/dir/xyz_modified
"""
input:
input1 = "/home/other/merged.txt",
input2 = "{prefix}.bed"
output: "{prefix}_modified"
shell:
"""
bedtools closest -a {input.input1} -b {input.input2} > {output}
"""
Here's an experiment:
Move yourself into a temporary directory and within that directory do the following:
mkdir bedfiles
touch bedfiles/{a,b,c,d}.bed
Then add a file called Snakefile
into your current directory that contains the following code
import os
import os.path
import re
input_dir = "bedfiles"
input_files = [os.path.join(input_dir, f) for f in os.listdir(input_dir)]
print(input_files)
output_files = [re.sub(".bed$", "_modified", f) for f in input_files]
print(output_files)
rule all:
input: output_files
rule mover:
input: "{prefix}.bed"
output: "{prefix}_modified"
shell:
""" cp {input} {output} """
Then run it using snakemake
at the command line. Snakemake is goal-oriented; it works out how to make your desired outputs based on the existing files.
回答2:
Easy one: invalid syntax refers to a missing ,
after input1 = "/home/other/merged.txt"
Hope it helps
Marc
来源:https://stackoverflow.com/questions/48443572/using-multiple-filenames-as-wildcards-in-snakemake