How group by sum and average column in python?

前端 未结 2 1701
南旧
南旧 2020-12-20 07:33

As input I have a CSV file with times and a bunch of numbers for each time.

Time,F1,F2,F3
8:11,5,2,4
9:25,9,8,2
9:39,7,3,2
9:53,6,5,1
10:07,4,6,7
10:21,7,3,1         


        
相关标签:
2条回答
  • 2020-12-20 08:01

    The following should get you started, it uses Python's csv module to process the files and itertools.groupby to group the entries by hour:

    import csv
    from itertools import groupby, chain
    
    with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
        csv_input = csv.reader(f_input)
        csv_output = csv.writer(f_output)
        header = next(csv_input)
        csv_output.writerow(["Time","SUM F1","SUM F2","SUM F3","AVG F1","AVG F2","AVG F3"])
    
        for k, g in groupby(csv_input, lambda x: int(x[0].split(':')[0])):
            entries = [(int(f1), int(f2), int(f3)) for t, f1, f2, f3 in g]
            sums = [(sum(x), sum(x)/float(len(entries))) for x in zip(*entries)]
            row = ['{}:00'.format(k)] + list(chain.from_iterable(zip(*sums)))
            csv_output.writerow(row)
    

    This would give you an output csv file looking like this:

     Time,SUM F1,SUM F2,SUM F3,AVG F1,AVG F2,AVG F3
     8:00,5,2,4,5.0,2.0,4.0
     9:00,22,16,5,7.333333333333333,5.333333333333333,1.6666666666666667
     10:00,16,15,15,5.333333333333333,5.0,5.0
     11:00,1,2,1,1.0,2.0,1.0
     12:00,3,3,1,3.0,3.0,1.0
    

    zip is used to transpose the column entries.

    Tested using Python 2.7.9

    0 讨论(0)
  • 2020-12-20 08:17

    A pandas solution:

    import pandas as pd
    
    df = pd.read_csv('f123.csv')
    df['Time'] = df['Time'].apply(lambda x: x.split(':')[0] + ':00')
    by_hour = df.groupby('Time')
    data = {}
    for name in ['F1', 'F2', 'F3']:
        data['SUM ' + name] = by_hour[name].sum()
        data['AVG ' + name] = by_hour[name].mean()
    res = pd.DataFrame(data)
    print(res)
    

    prints:

             AVG F1    AVG F2    AVG F3  SUM F1  SUM F2  SUM F3
    Time                                                       
    10:00  5.333333  5.000000  5.000000      16      15      15
    11:00  1.000000  2.000000  1.000000       1       2       1
    12:00  3.000000  3.000000  1.000000       3       3       1
    8:00   5.000000  2.000000  4.000000       5       2       4
    9:00   7.333333  5.333333  1.666667      22      16       5
    

    Save as csv file:

    res.to_csv('res.csv')
    

    This is the content of res.csv:

    Time,AVG F1,AVG F2,AVG F3,SUM F1,SUM F2,SUM F3
    10:00,5.333333333333333,5.0,5.0,16,15,15
    11:00,1.0,2.0,1.0,1,2,1
    12:00,3.0,3.0,1.0,3,3,1
    8:00,5.0,2.0,4.0,5,2,4
    9:00,7.333333333333333,5.333333333333333,1.6666666666666667,22,16,5
    
    0 讨论(0)
提交回复
热议问题