Snakemake: How to use config file efficiently

不打扰是莪最后的温柔 提交于 2019-12-11 08:54:43

问题


I'm using the following config file format in snakemake for a some sequencing analysis practice (I have loads of samples each containing 2 fastq files:

samples:
Sample1_XY:
    - fastq_files/SRR4356728_1.fastq.gz
    - fastq_files/SRR4356728_2.fastq.gz
Sample2_AB:
    - fastq_files/SRR6257171_1.fastq.gz
    - fastq_files/SRR6257171_2.fastq.gz 

I'm using the following rules at the start of my pipeline to run fastqc and for alignment of the fastqc files:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
    input:
    expand("FastQC/{sample}_fastqc.zip", sample=config["samples"]),
    expand("bam_files/{sample}.bam", sample=config["samples"]),
    "FastQC/fastq_multiqc.html"

rule fastqc:
    input:
        sample=lambda wildcards: config['samples'][wildcards.sample]
    output:
        # Output needs to end in '_fastqc.html' for multiqc to work
        html="FastQC/{sample}_fastqc.html",
        zip="FastQC/{sample}_fastqc.zip"
    params: ""
        wrapper:
        "0.21.0/bio/fastqc"

rule bowtie2:
    input:
         sample=lambda wildcards: config['samples'][wildcards.sample]
    output:
         "bam_files/{sample}.bam"
    log:
         "logs/bowtie2/{sample}.txt"
    params:
         index=config["index"],  # prefix of reference genome index (built with bowtie2-build),
    extra=""
         threads: 8
    wrapper:
         "0.21.0/bio/bowtie2/align"

 rule multiqc_fastq:
    input:
         expand("FastQC/{sample}_fastqc.html", sample=config["samples"])
    output:
         "FastQC/fastq_multiqc.html"
    params:
    log:
         "logs/multiqc.log"
    wrapper:
         "0.21.0/bio/multiqc"

My issue is with the fastqc rule.

Currently both the fastqc rule and the bowtie2 rule create one output file generated using two inputs SRRXXXXXXX_1.fastq.gz and SRRXXXXXXX_2.fastq.gz.

I need the fastq rule to generate two files, a separate one for each of the fastq.gz files but I'm unsure how to index the config file correctly from the fastqc rule input statement, or how to combine the the expand and wildcards commands to solve this. I can get an individual fastq file by adding [0] or [1] to the end of the input statement, but not both run individually/separately.

I've been messing around trying to get the correct indexing format to access each file separately. The current format is the only one I've managed that allows snakemake -np to generate a job list.

Any tips would be greatly appreciated.


回答1:


It appears each sample would have two fastq files, and they are named in format ***_1.fastq.gz and ***_2.fastq.gz. In that case, config and code below would work.

config.yaml:

samples:
    Sample_A: fastq_files/SRR4356728
    Sample_B: fastq_files/SRR6257171

Snakefile:

# read config info into this namespace
configfile: "config.yaml"
print (config['samples'])

rule all:
    input:
        expand("FastQC/{sample}_{num}_fastqc.zip", sample=config["samples"], num=['1', '2']),
        expand("bam_files/{sample}.bam", sample=config["samples"]),
        "FastQC/fastq_multiqc.html"

rule fastqc:
    input:
        sample=lambda wildcards: f"{config['samples'][wildcards.sample]}_{wildcards.num}.fastq.gz"
    output:
        # Output needs to end in '_fastqc.html' for multiqc to work
        html="FastQC/{sample}_{num}_fastqc.html",
        zip="FastQC/{sample}_{num}_fastqc.zip"
    wrapper:
        "0.21.0/bio/fastqc"

rule bowtie2:
    input:
         sample=lambda wildcards: expand(f"{config['samples'][wildcards.sample]}_{{num}}.fastq.gz", num=[1,2])
    output:
         "bam_files/{sample}.bam"
    wrapper:
         "0.21.0/bio/bowtie2/align"

rule multiqc_fastq:
    input:
        expand("FastQC/{sample}_{num}_fastqc.html", sample=config["samples"], num=['1', '2'])
    output:
        "FastQC/fastq_multiqc.html"
    wrapper:
        "0.21.0/bio/multiqc"


来源:https://stackoverflow.com/questions/50138171/snakemake-how-to-use-config-file-efficiently

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!