How to count number of unique values of a field in a tab-delimited text file?

前端 未结 7 723
滥情空心
滥情空心 2020-12-23 14:21

I have a text file with a large amount of data which is tab delimited. I want to have a look at the data such that I can see the unique values in a column. For example,

7条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-23 14:33

    Here is a bash script that fully answers the (revised) original question. That is, given any .tsv file, it provides the synopsis for each of the columns in turn. Apart from bash itself, it only uses standard *ix/Mac tools: sed tr wc cut sort uniq.

    #!/bin/bash
    # Syntax: $0 filename   
    # The input is assumed to be a .tsv file
    
    FILE="$1"
    
    cols=$(sed -n 1p $FILE | tr -cd '\t' | wc -c)
    cols=$((cols + 2 ))
    i=0
    for ((i=1; i < $cols; i++))
    do
      echo Column $i ::
      cut -f $i < "$FILE" | sort | uniq -c
      echo
    done
    

提交回复
热议问题