Snakemake and pandas syntax

試著忘記壹切 提交于 2019-12-11 11:06:18

问题


I have a input file as follow

SampleName Run Read1 Read2
A run1 test/true_data/4k_R1.fq test/true_data/4k_R2.fq
A run2 test/samples/A.fastq test/samples/A2.fastq
B run1 test/samples/B.fastq test/samples/B2.fastq
C run1 test/samples/C.fastq test/samples/C5.fastq
D

So I am getting all indexs in an array:

sample_table    = pd.read_table('samples.tsv', sep=' ', lineterminator='\n')
sample_table    = sample_table.drop_duplicates(subset='SampleName', keep='first', inplace=False)
sample_table    = sample_table.dropna()
sample_table.set_index('SampleName',inplace=True)
sample_ID=sample_table.index.values

At this point sample_ID=['A' 'B' 'C'] which is what I want. Then I want to set a variable r1 that will correspond to the Read1 and r2 for the Read2 of each samples.

rule all:
    input:
        expand("test/fltr/{ID_sample}.fq", ID_sample=sample_ID)

rule send_reads:
    input:
        #Tried both way but it does not work 
        r1=sample_table.loc["{ID_sample}",'Read1']
        r2=sample_table.Read2["{ID_sample}"]
    output:
       "test/fltr/{ID_sample}{input.r1}.fq"
    shell:
       "touch {output}"

I get the error

the label [{ID_sample}] is not in the [index]

Is it a syntax error or a bigger mistake ?

I am just starting to use Snakemake, I thought I had understood it after the tutorial but obviously I did not.

Thanks a lot, Cheers


回答1:


lambda function can be used to get that value.

input:
    lambda wildcards, output: sample_table.Read2[wildcards.ID_sample]

Also, based on your rule all, your output needs to be test/fltr/{ID_sample}.fq. And, you have to use comma to separate two variables in input.



来源:https://stackoverflow.com/questions/52273322/snakemake-and-pandas-syntax

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!