While read line, awk $line and write to variable

自作多情 提交于 2020-01-03 05:37:07

问题


I am trying to split a file into different smaller files depending on the value of the fifth field. A very nice way to do this was already suggested and also here.

However, I am trying to incorporate this into a .sh script for qsub, without much success.

The problem is that in the section where the file to which output the line is specified,

i.e., f = "Alignments_" $5 ".sam" print > f

, I need to pass a variable declared earlier in the script, which specifies the directory where the file should be written. I need to do this with a variable which is built for each task when I send out the array job for multiple files.

So say $output_path = ./Sample1

I need to write something like

f = $output_path "/Alignments_" $5 ".sam"        print > f

But it does not seem to like having a $variable that is not a $field belonging to awk. I don't even think it likes having two "strings" before and after the $5.

The error I get back is that it takes the first line of the file to be split (little.sam) and tries to name f like that, followed by /Alignments_" $5 ".sam" (those last three put together correctly). It says, naturally, that it is too big a name.

How can I write this so it works?

Thanks!

awk -F '[:\t]' '    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
$5 in num {
    f = "Alignments_" $5 ".sam"        print > f
} ' Tile_Number_List.txt little.sam

UPDATE, AFTER ADDING -V TO AWK AND DECLARING THE VARIABLE OPATH

input=$1
outputBase=${input%.bam}

mkdir -v $outputBase\_TEST

newdir=$outputBase\_TEST

samtools view -h $input | awk 'NR >= 18' | awk -F '[\t:]' -v opath="$newdir" '

FNR == NR {
    num[$1]
    next
}

$5 in num {
    f = newdir"/Alignments_"$5".sam";
    print > f
} ' Tile_Number_List.txt -

mkdir: created directory little_TEST'
awk: cmd. line:10: (FILENAME=- FNR=1) fatal: can't redirect to `/Alignments_1101.sam' (Permission denied)

回答1:


awk variables are like C variables - just reference them by name to get their value, no need to stick a "$" in front of them like you do with shell variables:

awk -F '[:\t]' '    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
$5 in num {
    output_path = "./Sample1/"
    f = output_path "Alignments_" $5 ".sam"
    print > f
} ' Tile_Number_List.txt little.sam



回答2:


To pass the value of the shell variable such as $output_path to awk you need to use the -v option.

$ output_path=./Sample1/

$ awk -F '[:\t]' -v opath="$ouput_path" '    
    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
    $5 in num {
        f = opath"Alignments_"$5".sam"
        print > f
    } ' Tile_Number_List.txt little.sam

Also you still have the error from your previous question left in your script

EDIT:

The awk variable created with -v is obase but you use newdir what you want is:

input=$1
outputBase=${input%.bam}
mkdir -v $outputBase\_TEST
newdir=$outputBase\_TEST

samtools view -h "$input" | awk -F '[\t:]' -v opath="$newdir" '
FNR == NR && NR >= 18 {
    num[$1]
    next
}    
$5 in num {
    f = opath"/Alignments_"$5".sam"   # <-- opath is the awk variable not newdir
    print > f
}' Tile_Number_List.txt -

You should also move NR >= 18 into the second awk script.



来源:https://stackoverflow.com/questions/16407721/while-read-line-awk-line-and-write-to-variable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!