read multiple files in bash

蓝咒 提交于 2021-01-28 02:50:37

问题


I have two .txt files that I want to read line per line simultaneously in .sh script. Both .txt files have the same number of lines. Inside the loop I want to use the sed-command to change the full_sample_name and sample_name in another file. I know how this works if you just read one file, but I cannot get it work for two files.

#! /bin/bash

FULL_SAMPLE="file1.txt"
SAMPLE="file2.txt"

while read ... && ...
do
    sed -e "s/\<full_sample_name\>/$FULL_SAMPLE/g" -e "s/\<sample_name\>/$SAMPLE/g" pipeline.sh > $SAMPLE.sh

done < ...?

回答1:


#!/bin/bash

full_sample_file="file1.txt"
sample_file="file2.txt"

while read -r -u 3 full_sample_name && read -r -u 4 sample_name; do
    sed -e "s/\<full_sample_name\>/$full_sample_name/g" \
        -e "s/\<sample_name\>/$sample_name/g" \
        pipeline.sh >"$sample_name.sh"
done 3<"$full_sample_file" 4<"$sample_file" # automatically closed on loop exit

In this case, I'm assigning file descriptor 3 to file1.txt and file descriptor 4 to file2.txt.


By the way, with bash 4.1 or newer, you no longer need to handle file descriptors manually:

# opening explicitly, since even if opened on the loop, these need
# to be explicitly closed.
exec {full_sample_fd}<file1.txt
exec {sample_fd}<file2.txt

while read -r -u "$full_sample_fd" full_sample_name \
   && read -r -u "$sample_fd" sample_name; do
  : do stuff here with "$full_sample_name" and "$sample_name"
done

# close the files explicitly
exec {full_sample_fd}>&- {sample_fd}>&-

One more note: You could make this a bit more efficient (and also more correct, if your sample_name and full_sample_name values aren't guaranteed to evaluate to themselves when interpreted as regular expressions, if your input file contains no literal NULs [which, as a shell script, it shouldn't], and if the arrow brackets are intended to be literal rather than word-boundary regex characters) by not using sed at all, but just reading the input to be converted into a shell variable, and doing the replacements there!

exec {full_sample_fd}<file1.txt
exec {sample_fd}<file2.txt
IFS= read -r -d '' input_file <pipeline.sh

while read -r -u "$full_sample_fd" full_sample_name \
   && read -r -u "$sample_fd" sample_name; do
  output=${input_file//'<full_sample_name>'/${full_sample_name}}
  output=${output//'<sample_name>'/${sample_name}}
  printf '%s' "$output" >"${sample_name}.sh"
done

# close the files explicitly
exec {full_sample_fd}>&- {sample_fd}>&-



回答2:


Charles provided a very good answer.

You could use paste to join the lines of the files with some delimiter (that shouldn't appear in the files):

paste -d ":" file1.txt file2.txt | while IFS=":" read -r full samp; do
    do_stuff_with "$full" and "$samp"
done



回答3:


With GNU Parallel it will look like this:

#! /bin/bash

do_sed() {
    sed -e "s/\<full_sample_name\>/$1/g" -e "s/\<sample_name\>/$2/g" pipeline.sh > "$2".sh
}
export -f do_sed   

parallel --xapply do_sed {1} {2} :::: file1.txt file2.txt

The added benefit is that you get it run in parallel. Depending on your storage system this may speed up the processing: On a raid6 I have seen a 6x speedup by running 10 jobs in parallel. YMMV, so the only way to know for sure is to test and measure.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel



来源:https://stackoverflow.com/questions/29421972/read-multiple-files-in-bash

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!