Count occurrences of character per line/field on Unix

后端 未结 10 2103
难免孤独
难免孤独 2020-12-23 16:19

Given a file with data like this (ie stores.dat file)

sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200


        
相关标签:
10条回答
  • 2020-12-23 16:49

    No need for awk or perl, only with bash and standard Unix utilities:

    cat file | tr -c -d "t\n" | cat -n |
      { echo "count   lineNum"
        while read num data; do
          test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
        done; }
    

    And for a particular column:

    cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
      { echo -e "count lineNum"
        while read num data; do
          test ${#data} -gt 0 && printf "%4d   %5d\n" ${#data} $num
        done; }
    

    And we can even avoid tr and the cats:

    echo "count   lineNum"
    num=1
    while read data; do
      new_data=${data//t/}
      count=$((${#data}-${#new_data}))
      test $count -gt 0 && printf "%4d   %5d\n" $count $num
      num=$(($num+1))
    done < file
    

    and event the cut:

    echo "count   lineNum"
    num=1; OLF_IFS=$IFS; IFS="|"
    while read -a array_data; do
      data=${array_data[1]}
      new_data=${data//t/}
      count=$((${#data}-${#new_data}))
      test $count -gt 0 && printf "%4d   %5d\n" $count $num
      num=$(($num+1))
    done < file
    IFS=$OLF_IFS
    
    0 讨论(0)
  • 2020-12-23 16:49
     $ cat -n test.txt
     1  test 1
     2  you want
     3  void
     4  you don't want
     5  ttttttttttt
     6  t t t t t t
    
     $ awk '{n=split($0,c,"t")-1;if (n!=0) print n,NR}' test.txt
     2 1
     1 2
     2 4
     11 5
     6 6
    
    0 讨论(0)
  • 2020-12-23 16:49

    You could also split the line or field with "t" and check the length of the resulting array - 1. Set the col variable to 0 for the line or 1 through 3 for columns:

    awk -F'|' -v col=0 -v OFS=$'\t' 'BEGIN {
        print "count", "lineNum"
    }{
        split($col, a, "t"); print length(a) - 1, NR
    }
    ' stores.dat
    
    0 讨论(0)
  • 2020-12-23 16:52
    perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat
    

    Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times tr found the character 't'. ++$x maintains the line number count.

    0 讨论(0)
  • 2020-12-23 16:55
    grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1
    

    gives almost exactly the output you want:

      4 1
      3 2
      6 3
    

    Thanks to @raghav-bhushan for the grep -o hint, what a useful flag. The -n flag includes the line number as well.

    0 讨论(0)
  • 2020-12-23 17:03
    awk '{gsub("[^t]",""); print length($0),NR;}' stores.dat
    

    The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.

    Want to do it just for column 2?

    awk 'BEGIN{FS="|"} {gsub("[^t]","",$2); print NR,length($2);}' stores.dat
    
    0 讨论(0)
提交回复
热议问题