Group files and pipe to awk command

问题

I have files in a directory; they are named using YYYY_MM_DD:

-rw-r--r-- 1 root root 497186 Apr 21 13:17 2012_03_25
-rw-r--r-- 1 root root 490558 Apr 21 13:17 2012_03_26
-rw-r--r-- 1 root root 488797 Apr 21 13:17 2012_03_27
-rw-r--r-- 1 root root 316290 Apr 21 13:17 2012_03_28
-rw-r--r-- 1 root root 490081 Apr 21 13:17 2012_03_29
-rw-r--r-- 1 root root 486621 Apr 21 13:17 2012_03_30
-rw-r--r-- 1 root root 490904 Apr 21 13:17 2012_03_31
-rw-r--r-- 1 root root 491788 Apr 21 13:17 2012_04_01
-rw-r--r-- 1 root root 488630 Apr 21 13:17 2012_04_02

The first column within the file is a number, and I am using the following awk command to take an average of that first column.

awk -F, '{ x += $1 } END { print x/NR }' MyFile

Using the same command i can pass two files to awk to get the total average of both files as a whole.

awk -F, '{ x += $1 } END { print x/NR }' File1 File2

What I want to do is this...

I want to get all the files in my directory, and group them per month, then pass all the files for the month to the awk command.

So as per the same data, there are 7 files in March, I would want all 7 files to be passed to my awk command like this:

awk -F, '{ x += $1 } END { print x/NR }' File1 File2 File3 File4 File5 File6 File7

Then likewise for April's set.

回答1:

Are you wanting to somehow accomplish this with awk alone, or can you use file globbing? For example:

awk -F, '{ #Do stuff }' 2012_03_[0-3][0-9]

will get all the March files.

You could also use 2012_03* but that's less specific in its globbing pattern than the above one.

Edit

You can use a shell script like this:

DIR="/tmp/tmp"
for month in $(find "$DIR" -maxdepth 1 -type f | sed 's/.*\/\([0-9]\{4\}_[0-9]\{2\}\).*/\1/' | sort -u); do
  awk -F, '#dostuff' "$DIR/${month}"_[0-3][0-9] > output/dir/SUM_"${month}"
done

As always, there are a few caveats. Files with spaces will break it. You'll get errors if there are files that don't conform to the YYYY_MM_DD format in the directory, but it shouldn't affect performance. Let me know if those constraints are not acceptable and I'll think on it a little more.

回答2:

In Perl you could do it like this:

#!/usr/bin/env perl
$dir = shift || ".";
opendir(DIR, $dir);
@files=grep (/\d{4}_\d{2}_\d{2}/, readdir(DIR));

foreach $file (@files)
{
    ($year_month) = $file =~ /(\d{4}_\d{2})/;
    open(FILE, "<$dir/$file");
    while($col = <FILE>)
    {
        $col =~ s/^(\d*)/\1/;
        if($col)
        {
            $hash{"$year_month"}{"count"}++;
            $hash{"$year_month"}{"sum"} += $col;
        }
    }
}

foreach $year_month (keys %hash)
{
    $avg = $hash{"$year_month"}{"sum"} / $hash{"$year_month"}{"count"};
    print "$year_month : $avg\n";
}

Can probably do it shorter, but this way you have a nice hash data structure in case you want to calculate it differently later. Call like:

script.pl /path/to/dir

EDIT: bug: forgot to add directory to path

来源：https://stackoverflow.com/questions/10262237/group-files-and-pipe-to-awk-command

标签

Linux

bash

awk

xargs