How to trace-back exact software version(s) used to generate result-files in a snakemake workflow

Say I'm following the best practise workflow suggested for snakemake. Now I'd like to know how (i.e. which version) a given file, say plots/myplot.pdf, was generated. I found this surprisingly hard if not impossible only having the result folder at hand.

In more detail, say I was generated the results using. snakemake --use-conda --conda-prefix ~/.conda/myenvs which will resolve and download the conda-environments specified in the rule below (copied from the documentation):

rule NAME:

Say the content of envs/ggplot.yaml is the following:

  - conda-forge
  - r-ggplot2

After completion the ggplot environment will have been saved under say (note, the env name d2d1d57b assigned by snakemake automatically): ~/.conda/myevns/d2d1d57b

The problem is that if I ship the workflow subfolder e.g. as the result to someone else (or as supplement to a paper), I don't know what ggplot version was used for that run. All I know is the content of the yaml file (which is also reported when using --reports.). Also, since ggplot depends on other software, such as for instance R, I wouldn't know which R version was used for a given rule using this environment, since yaml file doesn't list indirect dependencies.

Ideally, I'd like want to have the complete environment software version shipped with the workflow results. As a workaround one could use conda env export name_of_env and copy the output in the result folder, but strangly conda list -n ~/.conda/myevns/d2d1d57b does not work ( due to error Characters not allowed: ('/', ' ', ':', '#'))

Creating a environment manually and inspecting indeed gives me (among other info):

r-base                    4.0.2                he766273_1    conda-forge
r-ggplot2                 3.3.2             r40h6115d3f_0    conda-forge

That's exactly what I'm after, but this of course would be too tedious manually.

This is also true when using wrappers as far as I can tell.

In summary, given a workflow or even for a given file within the workflow, how to trace back which exact software version(s) were used to generate it. Ideally, this information would be automatically shipped with the result of a workflow by default.

Maybe I'm even missing something very obvious, so hopefully someone can shed some light on this.


Based on our discussion in the comments, you could redirect your environment to a log file:

rule NAME:
        conda env export > {log} 

However as you indicate this won't work if people do not use --use-conda, plus it is tedious to add this to each rule, so you could try something like this (not tested, might not work):

if workflow.use_conda:
    shell.prefix("set -o pipefail; conda env export > {log}; ")

Which adds the export to each shell command!

Now if you use scripts, I am not so sure anymore how to continue. "easiest" might be to just call "conda env export" in a shell command inside python/R


the shell prefix trick does not seem to work, so I striked through the text.


As @Maarten-vd-Sande mentioned, version should be specified in the conda env file. Just as you may have thought, you will also need to define r-base and its version in conda env file so as to ensure the use of specific version of R. See here for an example from a snakemake-wrapper.

As part of best practices towards reproducible research, it is highly recommended to specify tool versions in conda env files. Snakemake-wrappers typically follow this rule, but you might find some not following this.

