问题
I have files in a directory; they are named using YYYY_MM_DD:
-rw-r--r-- 1 root root 497186 Apr 21 13:17 2012_03_25
-rw-r--r-- 1 root root 490558 Apr 21 13:17 2012_03_26
-rw-r--r-- 1 root root 488797 Apr 21 13:17 2012_03_27
-rw-r--r-- 1 root root 316290 Apr 21 13:17 2012_03_28
-rw-r--r-- 1 root root 490081 Apr 21 13:17 2012_03_29
-rw-r--r-- 1 root root 486621 Apr 21 13:17 2012_03_30
-rw-r--r-- 1 root root 490904 Apr 21 13:17 2012_03_31
-rw-r--r-- 1 root root 491788 Apr 21 13:17 2012_04_01
-rw-r--r-- 1 root root 488630 Apr 21 13:17 2012_04_02
The first column within the file is a number, and I am using the following awk
command to take an average of that first column.
awk -F, '{ x += $1 } END { print x/NR }' MyFile
Using the same command i can pass two files to awk to get the total average of both files as a whole.
awk -F, '{ x += $1 } END { print x/NR }' File1 File2
What I want to do is this...
I want to get all the files in my directory, and group them per month, then pass all the files for the month to the awk command.
So as per the same data, there are 7 files in March, I would want all 7 files to be passed to my awk
command like this:
awk -F, '{ x += $1 } END { print x/NR }' File1 File2 File3 File4 File5 File6 File7
Then likewise for April's set.
回答1:
Are you wanting to somehow accomplish this with awk alone, or can you use file globbing? For example:
awk -F, '{ #Do stuff }' 2012_03_[0-3][0-9]
will get all the March files.
You could also use 2012_03*
but that's less specific in its globbing pattern than the above one.
Edit
You can use a shell script like this:
DIR="/tmp/tmp"
for month in $(find "$DIR" -maxdepth 1 -type f | sed 's/.*\/\([0-9]\{4\}_[0-9]\{2\}\).*/\1/' | sort -u); do
awk -F, '#dostuff' "$DIR/${month}"_[0-3][0-9] > output/dir/SUM_"${month}"
done
As always, there are a few caveats. Files with spaces will break it. You'll get errors if there are files that don't conform to the YYYY_MM_DD format in the directory, but it shouldn't affect performance. Let me know if those constraints are not acceptable and I'll think on it a little more.
回答2:
In Perl you could do it like this:
#!/usr/bin/env perl
$dir = shift || ".";
opendir(DIR, $dir);
@files=grep (/\d{4}_\d{2}_\d{2}/, readdir(DIR));
foreach $file (@files)
{
($year_month) = $file =~ /(\d{4}_\d{2})/;
open(FILE, "<$dir/$file");
while($col = <FILE>)
{
$col =~ s/^(\d*)/\1/;
if($col)
{
$hash{"$year_month"}{"count"}++;
$hash{"$year_month"}{"sum"} += $col;
}
}
}
foreach $year_month (keys %hash)
{
$avg = $hash{"$year_month"}{"sum"} / $hash{"$year_month"}{"count"};
print "$year_month : $avg\n";
}
Can probably do it shorter, but this way you have a nice hash data structure in case you want to calculate it differently later. Call like:
script.pl /path/to/dir
EDIT: bug: forgot to add directory to path
来源:https://stackoverflow.com/questions/10262237/group-files-and-pipe-to-awk-command