Count occurrences of character per line/field on Unix

后端 未结 10 2104
难免孤独
难免孤独 2020-12-23 16:19

Given a file with data like this (ie stores.dat file)

sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200


        
相关标签:
10条回答
  • 2020-12-23 17:05
    cat stores.dat | awk 'BEGIN {FS = "|"}; {print $1}' |  awk 'BEGIN {FS = "\t"}; {print NF}'
    

    Where $1 would be a column number you want to count.

    0 讨论(0)
  • 2020-12-23 17:06

    One possible solution using perl:

    Content of script.pl:

    use warnings;
    use strict;
    
    ## Check arguments:
    ## 1.- Input file
    ## 2.- Char to search.
    ## 3.- (Optional) field to search. If blank, zero or bigger than number
    ##     of columns, default to search char in all the line.
    (@ARGV == 2 || @ARGV == 3) or die qq(Usage: perl $0 input-file char [column]\n);
    
    my ($char,$column);
    
    ## Get values or arguments.
    if ( @ARGV == 3 ) {
            ($char, $column) = splice @ARGV, -2;
    } else {
            $char = pop @ARGV;
            $column = 0;
    }
    
    ## Check that $char must be a non-white space character and $column 
    ## only accept numbers.
    die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/; 
    
    print qq[count\tlineNum\n];
    
    while ( <> ) {
            ## Remove last '\n'
            chomp;
    
            ## Get fields.
            my @f = split /\|/;
    
            ## If column is a valid one, select it to the search.
            if ( $column > 0 and $column <= scalar @f ) {
                    $_ = $f[ $column - 1];
            }
    
            ## Count.
            my $count = eval qq[tr/$char/$char/];
    
            ## Print result.
            printf qq[%d\t%d\n], $count, $.;
    }
    

    The script accepts three parameters:

    1. Input file
    2. Char to search
    3. Column to search: If column is a bad digit, it searchs all the line.

    Running the script without arguments:

    perl script.pl
    Usage: perl script.pl input-file char [column]
    

    With arguments and its output:

    Here 0 is a bad column, it searches all the line.

    perl script.pl stores.dat 't' 0
    count   lineNum
    4       1
    3       2
    6       3
    

    Here it searches in column 1.

    perl script.pl stores.dat 't' 1
    count   lineNum
    0       1
    2       2
    0       3
    

    Here it searches in column 3.

    perl script.pl stores.dat 't' 3
    count   lineNum
    2       1
    1       2
    4       3
    

    th is not a char.

    perl script.pl stores.dat 'th' 3
    Bad input
    
    0 讨论(0)
  • 2020-12-23 17:09

    To count occurences of a character per line:

    $ awk -F 't' '{print NF-1, NR}'  input.txt
    4 1
    3 2
    6 3
    

    this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.

    To count occurences in a particular column cut out that column first:

    $ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
    1 1
    0 2
    1 3
    
    $ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
    2 1
    1 2
    4 3
    
    0 讨论(0)
  • 2020-12-23 17:10

    To count occurrence of a character per line you can do:

    awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
    count lineNum
    4       1
    3       2
    6       3
    

    To count occurrence of a character per field/column you can do:

    column 2:

    awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
    count lineNum
    1       1
    0       2
    1       3
    

    column 3:

    awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
    count lineNum
    2       1
    1       2
    4       3
    
    • gsub() function's return value is number of substitution made. So we use that to print the number.
    • NR holds the line number so we use it to print the line number.
    • For printing occurrences of particular field, we create a variable fld and put the field number we wish to extract counts from.
    0 讨论(0)
提交回复
热议问题