how to trim file - remove the columns with the same value

前端 未结 8 2050
天涯浪人
天涯浪人 2021-01-05 09:15

I would like your help on trimming a file by removing the columns with the same value.

# the file I have (tab-delimited, millions of columns)
jack 1 5 9
joh         


        
8条回答
  •  盖世英雄少女心
    2021-01-05 09:52

    You can select the column to cut out like

    # using bash/awk
    # I had used 1000000 here, as you had written millions of columns but you should adjust it
    for cols in `seq 2 1000000` ; do
        cut -d DELIMITER -f $cols FILE | awk -v c=$cols '{s+=$0} END {if (s/NR==$0) {printf("%i,",c)}}'
    done | sed 's/,$//' > tmplist
    cut --complement -d DELIMITER -f `cat tmplist` FILE
    

    But it can be REALLY slow, because it's not optimized, and reads the file several times... so be careful with huge files.

    Or you can read the whole file once with awk and select the dumpable columns, then use cut.

    cut --complement -d DELIMITER -f `awk '{for (i=1;i<=NF;i++) {sums[i]+=$i}} END {for (i=1;i<=NF; i++) {if (sums[i]/NR==$i) {printf("%i,",c)}}}' FILE | sed 's/,$//'` FILE
    

    HTH

提交回复
热议问题