sum value of a 3rd row and divide rows accordingly

问题

I have a file as below with n number of rows, I want to total it's sum(based on 3rd column) and distribute rows accordingly in 3 different files(based on sum of each)

For example- if we sum all the 3rd column values it's total is coming as 516 and if we divide it by 3 it is 172.

So i want to add a rows to a file so it doesn't exceed 172 mark, same with the 2nd file and rest all rows should move to the third file.

Input file

a aa 10
b ab 15
c ac 17
a dy 30
y ae 12
a dl 34
a fk 45
l ah 56
o aj 76 
l ai 12 
q al 09
d pl 34
e ik 30
f ll 10
g dl 15 
h fr 17
i dd 23
j we 27
k rt 12
l yt 13
m tt 19

expected output

file1(total -163)


a   aa  10
b   ab  15
c   ac  17
a   dy  30
y   ae  12
a   dl  34
a   fk  45

file2 (total-153)

l   ah  56
o   aj  76
l   ai  12
q   al  9

file3 (total - 200)

d   pl  34
e   ik  30
f   ll  10
g   dl  15
h   fr  17
i   dd  23
j   we  27
k   rt  12
l   yt  13
m   tt  19

回答1:

Could you please try following, written and tested with shown samples in GNU awk.

awk '
FNR==NR{
  sum+=$NF
  next
}
FNR==1{
  count=sum/3
}
{
  curr_sum+=$NF
}
(curr_sum>=count || FNR==1) && fileCnt<=2{
  close(out_file)
  out_file="file" ++fileCnt
  curr_sum=$NF
}
{
  print > (out_file)
}'   Input_file  Input_file

Explanation: Adding detailed explanation for above.

awk '                                               ##Starting awk program from here.
FNR==NR{                                            ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
  sum+=$NF                                          ##Taking sum of last field of all lines here and keep adding them to get cumulative sum of whole Input_file.
  next                                              ##next will skip all further statements from here.
}
FNR==1{                                             ##Checking condition if its first line for 2nd time reading of Input_file.
  count=sum/3                                       ##Creating count with value of sum/3 here.
}
{
  curr_sum+=$NF                                     ##Keep adding lst field sum in curr_sum here.
}
(curr_sum>=count || FNR==1) && fileCnt<=2{          ##Checking if current sum is <= count OR its first line(in 2nd time reading) AND output file count is <=2 here.
  close(out_file)                                   ##Closing output file here, may NOT be needed here since we are having only 3 files here in output.
  out_file="file" ++fileCnt                         ##Creating output file name here.
  curr_sum=$NF                                      ##Keep adding lst field sum in curr_sum here.
}
{
  print > (out_file)                                ##Printing current line into output file here.
}'   Input_file  Input_file                         ##Mentioning Input_file names here.

回答2:

awk '{ L[nr++]=$0; sum+=$3 }
     END{ sumpf=sum/3; sum=0; file=1; 
          for(i in L) { split(L[i],a); 
          if ((sum+a[3])>sumpf && file<3) { file+=1; sum=0; }; 
          print i, L[i] > "file" file;
          sum+=a[3]; 
        }
    }'  input

This script will read all input into array L, and calculate sum
Int the END block the sumPerFile is calculated sumpf, and the output is done.

In contrast to the other solution, this only needs one inputfile.

来源：https://stackoverflow.com/questions/62425230/sum-value-of-a-3rd-row-and-divide-rows-accordingly

标签

awk

sed