AWK work wit vcf (text) file

青春壹個敷衍的年華 提交于 2019-12-25 04:53:38

问题


I would like to create awk code, which will modifie text like this:

  1. Tab delimited all columns
  2. Delete all columns which is starting by "##text"
  3. And keep headers, which starts "#header"

I have this code, but it is not good:

#!/bin/bash
for i
in *.vcf;
do
    awk 'BEGIN {print  "CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILT\tINFO\tFORMAT"}' |
    awk '{$1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7 "\t" $8 "\t" $9}' $i |
    awk '!/#/' > ${i%.vcf}.tsv;
done

INPUT:

> ##fileformat=VCFv4.1
> ##FORMAT=<ID=GQX,Number=1,Type=Integer,Description="Minimum of {Genotype quality assuming variant position,Genotype quality assuming
> non-variant position}">
> #CHROM    POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1 chr1  10385471    rs17401966  A   G   100.00  PASS    DP=67;TI=NM_015074;GI=KIF1B;FC=Silent   GT:GQ:AD:VF:NL:SB:GQX   0/1:100:29,38:0.5672:20:-100.0000:100
> chr1  17380497    rs2746462   G   T   100.00  PASS    DP=107;TI=NM_003000;GI=SDHB;FC=Synonymous_A6A;EXON  GT:GQ:AD:VF:NL:SB:GQX   1/1:100:0,107:1.0000:20:-100.0000:100
> chr1  222045446   rs6691170   G   T   100.00  PASS    DP=99   GT:GQ:AD:VF:NL:SB:GQX   0/1:100:49,50:0.5051:20:-100.0000:100

OUTPUT: What I want

> CHROM POS   ID          REF  ALT  QUAL    FILTER  INFO             etc...
> hr1   10385471  rs17401966  A   
> G 100.00  PASS    DP=67;TI=NM_015074;GI=KIF1B;

回答1:


You want to put your whole program in a single awk call:

for f in *.vcf; do
    awk '
        BEGIN {OFS = "\t"}
        /^##/ {next}
        /^#/ {sub(/^#/,"",$1)}
        {$1=$1; print}
    ' "$f" > "${f/%vcf/tsv}"
done

This program will skip any record that begins with ##, will remove the leading hash for lines that have it, and then print each line using tab as the field separator.

awk programs are a series of condition {action} pairs. For each record in the input, if the condition is true, the action block is performed, otherwise it is ignored. If the condition is omitted, the action block is performed unconditionally.

One tricky bit in this example is $1=$1 -- when fields are modified, awk will re-build the record, joining the fields using the output field separator (OFS variable).



来源:https://stackoverflow.com/questions/19449828/awk-work-wit-vcf-text-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!