I would like your help on trimming a file by removing the columns with the same value.
# the file I have (tab-delimited, millions of columns)
jack 1 5 9
joh
Not fully tested but this seems to work for the provided test set, note that it destroys the original file...
#!/bin/bash
#change 4 below to match number of columns
for i in {2..4}; do
cut -f $i input | sort | uniq -c > tmp
while read a b; do
if [ $a -ge 2 ]; then
awk -vfield=$i '{$field="_";print}' input > tmp2
$(mv tmp2 input)
fi
done < tmp
done
$ cat input
jack 1 5 9
john 3 5 0
lisa 4 5 7
$ ./cnt.sh
$ cat input
jack 1 _ 9
john 3 _ 0
lisa 4 _ 7
Using _ to make the output clearer...