How would I loop over pairs of values without repetition in bash?

问题

I'm using a particular program that would require me to examine pairs of variables in a text file by specifying the pairs using indices.

For example:

gcta  --reml-bivar 1 2 --grm test  --pheno test.phen  --out test

Where 1 and 2 would correspond to values from the first two columns in a text file. If I had 50 columns and wanted to examine each pair without repetition (1&2, 2&3, 1&3 ... 50), what would be the best way to automate this by looping through this? So essentially the script would be executing the same command but taking in pairs of indices like:

gcta  --reml-bivar 1 3 --grm test  --pheno test.phen  --out test
gcta  --reml-bivar 1 4 --grm test  --pheno test.phen  --out test

... so on and so forth. Thanks!

回答1:

Since you haven't shown us any sample input we're just guessing but if your input is list of numbers (extracted from a file or otherwise) then here's an approach:

$ cat combinations.awk
###################
# Calculate all combinations of a set of strings, see
# https://rosettacode.org/wiki/Combinations#AWK
###################

function get_combs(A,B, i,n,comb) {
    ## Default value for r is to choose 2 from pool of all elements in A.
    ## Can alternatively be set on the command line:-
    ##    awk -v r=<number of items being chosen> -f <scriptname>
    n = length(A)
    if (r=="") r = 2

    comb = ""
    for (i=1; i <= r; i++) { ## First combination of items:
        indices[i] = i
        comb = (i>1 ? comb OFS : "") A[indices[i]]
    }
    B[comb]

    ## While 1st item is less than its maximum permitted value...
    while (indices[1] < n - r + 1) {
        ## loop backwards through all items in the previous
        ## combination of items until an item is found that is
        ## less than its maximum permitted value:
        for (i = r; i >= 1; i--) {
            ## If the equivalently positioned item in the
            ## previous combination of items is less than its
            ## maximum permitted value...
            if (indices[i] < n - r + i) {
                ## increment the current item by 1:
                indices[i]++
                ## Save the current position-index for use
                ## outside this "for" loop:
                p = i
                break}}
        ## Put consecutive numbers in the remainder of the array,
        ## counting up from position-index p.
        for (i = p + 1; i <= r; i++) indices[i] = indices[i - 1] + 1

        ## Print the current combination of items:
        comb = ""
        for (i=1; i <= r; i++) {
            comb = (i>1 ? comb OFS : "") A[indices[i]]
        }
        B[comb]
    }
}

# Input should be a list of strings
{
    split($0,A)
    delete B
    get_combs(A,B)
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (comb in B) {
        print comb
    }
}

$ awk -f combinations.awk <<< '1 2 3 4'
1 2
1 3
1 4
2 3
2 4
3 4

$ while read -r a b; do
    echo gcta  --reml-bivar "$a" "$b" --grm test  --pheno test.phen  --out test
done < <(awk -f combinations.awk <<< '1 2 3 4')
gcta --reml-bivar 1 2 --grm test --pheno test.phen --out test
gcta --reml-bivar 1 3 --grm test --pheno test.phen --out test
gcta --reml-bivar 1 4 --grm test --pheno test.phen --out test
gcta --reml-bivar 2 3 --grm test --pheno test.phen --out test
gcta --reml-bivar 2 4 --grm test --pheno test.phen --out test
gcta --reml-bivar 3 4 --grm test --pheno test.phen --out test

Remove the echo when you're done testing and happy with the output.

In case anyone's reading this and wants permutations instead of combinations:

$ cat permutations.awk
###################
# Calculate all permutations of a set of strings, see
# https://en.wikipedia.org/wiki/Heap%27s_algorithm

function get_perm(A,            i, lgth, sep, str) {
    lgth = length(A)
    for (i=1; i<=lgth; i++) {
        str = str sep A[i]
        sep = " "
    }
    return str
}

function swap(A, x, y,  tmp) {
    tmp  = A[x]
    A[x] = A[y]
    A[y] = tmp
}

function generate(n, A, B,      i) {
    if (n == 1) {
        B[get_perm(A)]
    }
    else {
        for (i=1; i <= n; i++) {
            generate(n - 1, A, B)
            if ((n%2) == 0) {
                swap(A, 1, n)
            }
            else {
                swap(A, i, n)
            }
        }
    }
}

function get_perms(A,B) {
    generate(length(A), A, B)
}

###################

# Input should be a list of strings
{
    split($0,A)
    delete B
    get_perms(A,B)
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (perm in B) {
        print perm
    }
}

$ awk -f permutations.awk <<< '1 2 3 4'
1 2 3 4
1 2 4 3
1 3 2 4
1 3 4 2
1 4 2 3
1 4 3 2
2 1 3 4
2 1 4 3
2 3 1 4
2 3 4 1
2 4 1 3
2 4 3 1
3 1 2 4
3 1 4 2
3 2 1 4
3 2 4 1
3 4 1 2
3 4 2 1
4 1 2 3
4 1 3 2
4 2 1 3
4 2 3 1
4 3 1 2
4 3 2 1

Both of the above use GNU awk for sorted_in to sort the output. If you don't have GNU awk you can still use the scripts as-is and if you need to sort the output then pipe it to sort.

回答2:

If I understand you correctly and you don't need pairs looks like '1 1', '2 2', ... and '1 2', '2 1' ... try this script

#!/bin/bash

for i in $(seq 1 49);
do
    for j in $(seq $(($i + 1)) 50);
    do gcta --reml-bivar "$i $j" --grm test --pheno test.phen --out test
done;

done;

回答3:

1 and 2 would correspond to values from the first two columns in a text file.

each pair without repetition

So let's walk through this process:

We repeat the first column from the file times the file length
We repeat each value (each line) from the second column from the file times the file length
We join the repeated columns -> we have all combinations
We need to filter "repetitions", we can just join the file with the original file and filter out repeating columns
So we get each pair without repetitions.
Then we just read the file line by line.

The script:

# create an input file cause you didn't provide any
cat << EOF > in.txt
1 a
2 b
3 c
4 d
EOF

# get file length
inlen=$(<in.txt wc -l)

# join the columns
paste -d' ' <(
  # repeat the first column inlen times
  # https://askubuntu.com/questions/521465/how-can-i-repeat-the-content-of-a-file-n-times
  seq "$inlen" |
  xargs -I{} cut -d' ' -f1 in.txt
) <(
  # repeat each line inlen times
  # https://unix.stackexchange.com/questions/81904/repeat-each-line-multiple-times
  awk -v IFS=' ' -v v="$inlen" '{for(i=0;i<v;i++)print $2}' in.txt
) |
# filter out repetitions - ie. filter original lines from the file
sort |
comm --output-delimiter='' -3 <(sort in.txt) - |
# read the file line by line
while read -r one two; do
  echo "$one" "$two"
done

will output:

1 b
1 c
1 d
2 a
2 c
2 d
3 a
3 b
3 d
4 a
4 b
4 c

回答4:

    #!/bin/bash

    #set the length of the combination depending the 
    #user's choice 

    eval rg+=({1..$2})

    #the code builds the script and runs it (eval)

    eval `
    #Character range depending on user selection
    for i in ${rg[@]} ; do
    echo "for c$i in {1..$1} ;do " 
    done ;


    #Since the script is based on a code that brings 
    #all possible combinations even with duplicates - 
    #this is where the deduplication 
    #prevention conditioning set by (the script writes           
    #the conditioning code)


    op1=$2
    op2=$(( $2 - 1 ))
    echo -n "if [ 1 == 1 ] "

    while [ $op1 -gt 1 ]  ; do
    echo -n  \&\& [ '$c'$op1 != '$c'$op2 ]' '
    op2=$(( op2 -1 )
    if [ $op2 == 0 ] ; then  
            op1=$(( op1 - 1 ))
            op2=$(( op1 - 1 ))
    fi
    done ;

    echo  ' ; then'
    echo -n "echo "

    for i in ${rg[@]} ; 
    do
    echo -n '$c'$i
    done ;

    echo \;
    echo fi\;

    for i in ${rg[@]} ; do
    echo 'done ;'
    done;`

    example:               range       length
    $ ./combs.bash '{1..2} {a..c} \$ \#' 4
    12ab$
    12ab#
    12acb
    12ac$
    12ac#
    12a$b
    12a$c
    12a$#
    12a#b
    12a#c
    12a#$
    ..........

回答5:

      #!/bin/bash
      len=$2
      eval c=($1)
      per()
      {
      ((`grep -Poi '[^" ".]'<<<$2|sort|uniq|wc -l` < $((len - ${1}))))&&{ return;}
      (($1 == 0))&&{ echo $2;return;}
      for i in ${c[@]} ; do
      per "$((${1} - 1 ))" "$2 $i"
      done
      }
      per "$2" ""

      #example
      $ ./neto '{0..3} {a..d} \# \!'  7
      0 1 2 3 a b c
      0 1 2 3 a b d
      0 1 2 3 a b #
      0 1 2 3 a b !
      0 1 2 3 a c b
      0 1 2 3 a c d
      0 1 2 3 a c #
      0 1 2 3 a c !
      0 1 2 3 a d b
      ...

来源：https://stackoverflow.com/questions/56911991/how-would-i-loop-over-pairs-of-values-without-repetition-in-bash

标签

bash

loops

awk