I would like your help on trimming a file by removing the columns with the same value.
# the file I have (tab-delimited, millions of columns)
jack 1 5 9
joh
You can select the column to cut out like
# using bash/awk
# I had used 1000000 here, as you had written millions of columns but you should adjust it
for cols in `seq 2 1000000` ; do
cut -d DELIMITER -f $cols FILE | awk -v c=$cols '{s+=$0} END {if (s/NR==$0) {printf("%i,",c)}}'
done | sed 's/,$//' > tmplist
cut --complement -d DELIMITER -f `cat tmplist` FILE
But it can be REALLY slow, because it's not optimized, and reads the file several times... so be careful with huge files.
Or you can read the whole file once with awk and select the dumpable columns, then use cut.
cut --complement -d DELIMITER -f `awk '{for (i=1;i<=NF;i++) {sums[i]+=$i}} END {for (i=1;i<=NF; i++) {if (sums[i]/NR==$i) {printf("%i,",c)}}}' FILE | sed 's/,$//'` FILE
HTH