How to count number of unique values of a field in a tab-delimited text file?

前端未结

关注

 7  710

滥情空心 2020-12-23 14:21

I have a text file with a large amount of data which is tab delimited. I want to have a look at the data such that I can see the unique values in a column. For example,

7条回答

独厮守ぢ (楼主)

2020-12-23 14:38
This script outputs the number of unique values in each column of a given file. It assumes that first line of given file is header line. There is no need for defining number of fields. Simply save the script in a bash file (.sh) and provide the tab delimited file as a parameter to this script.

Code
```
#!/bin/bash

awk '
(NR==1){
    for(fi=1; fi<=NF; fi++)
        fname[fi]=$fi;
} 
(NR!=1){
    for(fi=1; fi<=NF; fi++) 
        arr[fname[fi]][$fi]++;
} 
END{
    for(fi=1; fi<=NF; fi++){
        out=fname[fi];
        for (item in arr[fname[fi]])
            out=out"\t"item"_"arr[fname[fi]][item];
        print(out);
    }
}
' $1
```
Execution Example:

bash> ./script.sh

Output Example
```
isRef    A_15      C_42     G_24     T_18
isCar    YEA_10    NO_40    NA_50
isTv     FALSE_33  TRUE_66
```
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...