snakemake

Snakemake: confusion on how to access config files properly

拜拜、爱过 提交于 2021-02-15 11:53:52
问题 This question follows on from a question I asked previously and it regards understanding how to access config files correctly using Snakemake. I have a specific problem I need to address which I'll ask first and a general problem understanding how indexing works which I'll ask second. I'm using snakemake to run and ATAC-seq pipeline from Alignment/QC through to motif analysis. A: Specific Question I'm trying to add a rule called trim_galore_pe to trim adapters from my fastq files before

How to make Snakemake input optional but not empty?

天大地大妈咪最大 提交于 2021-02-11 14:21:19
问题 I'm building an SQL script out of text data. The (part of) script shall consist of a CREATE TABLE statement and an optional INSERT INTO statement. The values for INSERT INTO statement are taken from the list of files, each one may exist or may not; all values of existing files are merged. The crucial part is that the INSERT INTO statement shall be skipped whenever no one data file exists. I've created a script in Snakemake that does that. There are two ambiguous rules that create a script:

How to make Snakemake input optional but not empty?

一世执手 提交于 2021-02-11 14:20:47
问题 I'm building an SQL script out of text data. The (part of) script shall consist of a CREATE TABLE statement and an optional INSERT INTO statement. The values for INSERT INTO statement are taken from the list of files, each one may exist or may not; all values of existing files are merged. The crucial part is that the INSERT INTO statement shall be skipped whenever no one data file exists. I've created a script in Snakemake that does that. There are two ambiguous rules that create a script:

How to trace-back exact software version(s) used to generate result-files in a snakemake workflow

♀尐吖头ヾ 提交于 2021-02-11 06:35:34
问题 Say I'm following the best practise workflow suggested for snakemake. Now I'd like to know how (i.e. which version) a given file, say plots/myplot.pdf , was generated. I found this surprisingly hard if not impossible only having the result folder at hand. In more detail, say I was generated the results using. snakemake --use-conda --conda-prefix ~/.conda/myenvs which will resolve and download the conda-environments specified in the rule below (copied from the documentation): rule NAME: input:

Use Github URL for wrapper in Snakemake rule

浪尽此生 提交于 2021-02-10 05:12:43
问题 I know three working ways to define a wrapper-based rule in a Snakefile: rule way1_wrapper_repository: wrapper: "0.0.8/bio/samtools_sort" rule way2_local_relative_directory: wrapper: "local_wrappers/dir/samtools_sort" rule way3_local_absolute_directory: wrapper: "file:///absolute/path/to/wrapper/samtools_sort" The documentation states: Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including URLs to local files with absolute paths file:// or relative

Use Github URL for wrapper in Snakemake rule

混江龙づ霸主 提交于 2021-02-10 05:10:19
问题 I know three working ways to define a wrapper-based rule in a Snakefile: rule way1_wrapper_repository: wrapper: "0.0.8/bio/samtools_sort" rule way2_local_relative_directory: wrapper: "local_wrappers/dir/samtools_sort" rule way3_local_absolute_directory: wrapper: "file:///absolute/path/to/wrapper/samtools_sort" The documentation states: Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including URLs to local files with absolute paths file:// or relative

Snakemake cannot handle very long command line?

不羁岁月 提交于 2021-02-08 21:56:04
问题 This is a very strange problem. When my {input} specified in the rule section is a list of <200 files, snakemake worked all right. But when {input} has more than 500 files, snakemake just quitted with messages (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) . The complete log did not provide any error messages. For the log, please see: https://github.com/snakemake/snakemake/files/5285271/2020-09-25T151835.613199.snakemake.log The rule that

Snakemake, RNA-seq : How can I execute one subpart of a pipeline or another subpart based on the characteristics of the sample that is analysed?

跟風遠走 提交于 2021-02-08 03:41:53
问题 I am using snakemake to design a RNAseq-data analysis pipeline. While I've managed to do that, I want to make my pipeline to be as adaptable as possible and make it able to deal with single-reads (SE) data or paired-end (PE) data within the same run of analyses, instead of analysing SE data in one run and PE data in another. My pipeline is supposed to be designed like this : dataset download that gives 1 file (SE data) or 2 files (PE data) --> set of rules A specific to 1 file OR set of rules

awk command fails in snakemake --use-singularity

萝らか妹 提交于 2021-02-07 08:45:28
问题 I am trying to combine Snakemake with Singularity, and I noticed that a simple awk command no longer works when using singularity. The $1 in the last line gets replaced by bash instead of being used as the first field by awk . Here is a minimal working example ( Snakefile ): singularity: "docker://debian:stretch" rule all: input: "test.txt" rule test: output: "test.txt" shell: "cat /etc/passwd | awk -F':' '{{print $1}}' > {output}" When I run snakemake without singularity, the output test.txt

awk command fails in snakemake --use-singularity

若如初见. 提交于 2021-02-07 08:44:33
问题 I am trying to combine Snakemake with Singularity, and I noticed that a simple awk command no longer works when using singularity. The $1 in the last line gets replaced by bash instead of being used as the first field by awk . Here is a minimal working example ( Snakefile ): singularity: "docker://debian:stretch" rule all: input: "test.txt" rule test: output: "test.txt" shell: "cat /etc/passwd | awk -F':' '{{print $1}}' > {output}" When I run snakemake without singularity, the output test.txt