How to count number of unique values of a field in a tab-delimited text file?

前端 未结 7 710
滥情空心
滥情空心 2020-12-23 14:21

I have a text file with a large amount of data which is tab delimited. I want to have a look at the data such that I can see the unique values in a column. For example,

7条回答
  •  独厮守ぢ
    2020-12-23 14:38

    This script outputs the number of unique values in each column of a given file. It assumes that first line of given file is header line. There is no need for defining number of fields. Simply save the script in a bash file (.sh) and provide the tab delimited file as a parameter to this script.

    Code

    #!/bin/bash
    
    awk '
    (NR==1){
        for(fi=1; fi<=NF; fi++)
            fname[fi]=$fi;
    } 
    (NR!=1){
        for(fi=1; fi<=NF; fi++) 
            arr[fname[fi]][$fi]++;
    } 
    END{
        for(fi=1; fi<=NF; fi++){
            out=fname[fi];
            for (item in arr[fname[fi]])
                out=out"\t"item"_"arr[fname[fi]][item];
            print(out);
        }
    }
    ' $1
    

    Execution Example:

    bash> ./script.sh

    Output Example

    isRef    A_15      C_42     G_24     T_18
    isCar    YEA_10    NO_40    NA_50
    isTv     FALSE_33  TRUE_66
    

提交回复
热议问题