问题
I have a file as below with n number of rows, I want to total it's sum(based on 3rd column) and distribute rows accordingly in 3 different files(based on sum of each)
For example- if we sum all the 3rd column values it's total is coming as 516 and if we divide it by 3 it is 172.
So i want to add a rows to a file so it doesn't exceed 172 mark, same with the 2nd file and rest all rows should move to the third file.
Input file
a aa 10
b ab 15
c ac 17
a dy 30
y ae 12
a dl 34
a fk 45
l ah 56
o aj 76
l ai 12
q al 09
d pl 34
e ik 30
f ll 10
g dl 15
h fr 17
i dd 23
j we 27
k rt 12
l yt 13
m tt 19
expected output
file1(total -163)
a aa 10
b ab 15
c ac 17
a dy 30
y ae 12
a dl 34
a fk 45
file2 (total-153)
l ah 56
o aj 76
l ai 12
q al 9
file3 (total - 200)
d pl 34
e ik 30
f ll 10
g dl 15
h fr 17
i dd 23
j we 27
k rt 12
l yt 13
m tt 19
回答1:
Could you please try following, written and tested with shown samples in GNU awk
.
awk '
FNR==NR{
sum+=$NF
next
}
FNR==1{
count=sum/3
}
{
curr_sum+=$NF
}
(curr_sum>=count || FNR==1) && fileCnt<=2{
close(out_file)
out_file="file" ++fileCnt
curr_sum=$NF
}
{
print > (out_file)
}' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
sum+=$NF ##Taking sum of last field of all lines here and keep adding them to get cumulative sum of whole Input_file.
next ##next will skip all further statements from here.
}
FNR==1{ ##Checking condition if its first line for 2nd time reading of Input_file.
count=sum/3 ##Creating count with value of sum/3 here.
}
{
curr_sum+=$NF ##Keep adding lst field sum in curr_sum here.
}
(curr_sum>=count || FNR==1) && fileCnt<=2{ ##Checking if current sum is <= count OR its first line(in 2nd time reading) AND output file count is <=2 here.
close(out_file) ##Closing output file here, may NOT be needed here since we are having only 3 files here in output.
out_file="file" ++fileCnt ##Creating output file name here.
curr_sum=$NF ##Keep adding lst field sum in curr_sum here.
}
{
print > (out_file) ##Printing current line into output file here.
}' Input_file Input_file ##Mentioning Input_file names here.
回答2:
awk '{ L[nr++]=$0; sum+=$3 }
END{ sumpf=sum/3; sum=0; file=1;
for(i in L) { split(L[i],a);
if ((sum+a[3])>sumpf && file<3) { file+=1; sum=0; };
print i, L[i] > "file" file;
sum+=a[3];
}
}' input
- This script will read all input into array
L
, and calculatesum
- Int the END block the sumPerFile is calculated
sumpf
, and the output is done.
In contrast to the other solution, this only needs one inputfile.
来源:https://stackoverflow.com/questions/62425230/sum-value-of-a-3rd-row-and-divide-rows-accordingly