Python - Calculate average for every column in a csv file

后端 未结 6 943
执笔经年
执笔经年 2020-12-10 20:02

I\'m new in Python and I\'m trying to get the average of every (column or row) of a csv file for then select the values that are higher than the double of the average of its

相关标签:
6条回答
  • 2020-12-10 20:11

    I hope this helps you out......Some help....here is what I would do - which is use numpy:

        # ==========================
        import numpy as np
        import csv as csv
    
        #  Assume that you have 2 columns and a header-row: The Columns are (1) 
        #  question # ...1; (2) question 2
        # ========================================
    
        readdata = csv.reader(open('filename.csv', 'r'))  #this is the file you 
        # ....will write your original file to....============
        data = []
        for row in readdata:
        data.append(row)
        Header = data[0]
        data.pop(0)
        q1 = []
        q2 = []
        # ========================================
    
        for i in range(len(data)):
            q1.append(int(data[i][1]))
            q2.append(int(data[i][2]))
        # ========================================
        # ========================================
        # === Means/Variance - Work-up Section ===
        # ========================================
        print ('Mean - Question-1:            ', (np.mean(q1)))
        print ('Variance,Question-1:          ', (np.var(q1)))
        print ('==============================================')
        print ('Mean - Question-2:            ', (np.mean(q2)))
        print ('Variance,Question-2:          ', (np.var(q2)))
    
    0 讨论(0)
  • 2020-12-10 20:12

    Here's a clean up of your function, but it probably doesn't do what you want it to do. Currently, it is getting the average of all values in all columns:

    def average_column (csv):
        f = open(csv,"r")
        average = 0
        Sum = 0
        row_count = 0
        for row in f:
            for column in row.split(','):
                n=float(column)
                Sum += n
            row_count += 1
        average = Sum / len(column)
        f.close()
        return 'The average is:', average
    

    I would use the csv module (which makes csv parsing easier), with a Counter object to manage the column totals and a context manager to open the file (no need for a close()):

    import csv
    from collections import Counter
    
    def average_column (csv_filepath):
        column_totals = Counter()
        with open(csv_filepath,"rb") as f:
            reader = csv.reader(f)
            row_count = 0.0
            for row in reader:
                for column_idx, column_value in enumerate(row):
                    try:
                        n = float(column_value)
                        column_totals[column_idx] += n
                    except ValueError:
                        print "Error -- ({}) Column({}) could not be converted to float!".format(column_value, column_idx)                    
                row_count += 1.0            
    
        # row_count is now 1 too many so decrement it back down
        row_count -= 1.0
    
        # make sure column index keys are in order
        column_indexes = column_totals.keys()
        column_indexes.sort()
    
        # calculate per column averages using a list comprehension
        averages = [column_totals[idx]/row_count for idx in column_indexes]
        return averages
    
    0 讨论(0)
  • 2020-12-10 20:14

    First of all, as people say - CSV format looks simple, but it can be quite nontrivial, especially once strings enter play. monkut already gave you two solutions, the cleaned-up version of your code, and one more that uses CSV library. I'll give yet another option: no libraries, but plenty of idiomatic code to chew on, which gives you averages for all columns at once.

    def get_averages(csv):
        column_sums = None
        with open(csv) as file:
            lines = file.readlines()
            rows_of_numbers = [map(float, line.split(',')) for line in lines]
            sums = map(sum, zip(*rows_of_numbers))
            averages = [sum_item / len(lines) for sum_item in sums]
            return averages
    

    Things to note: In your code, f is a file object. You try to close it after you have already returned the value. This code will never be reached: nothing executes after a return has been processed, unless you have a try...finally construct, or with construct (like I am using - which will automatically close the stream).

    map(f, l), or equivalent [f(x) for x in l], creates a new list whose elements are obtained by applying function f on each element on l.

    f(*l) will "unpack" the list l before function invocation, giving to function f each element as a separate argument.

    0 讨论(0)
  • 2020-12-10 20:14

    I suggest breaking this into several smaller steps:

    1. Read the CSV file into a 2D list or 2D array.
    2. Calculate the averages of each column.

    Each of these steps can be implemented as two separate functions. (In a realistic situation where the CSV file is large, reading the complete file into memory might be prohibitive due to space constraints. However, for a learning exercise, this is a great way to gain an understanding of writing your own functions.)

    0 讨论(0)
  • 2020-12-10 20:19

    This definitely worked for me!

    import numpy as np
    import csv
    
    readdata = csv.reader(open('C:\\...\\your_file_name.csv', 'r'))
    data = []
    
    for row in readdata:
      data.append(row)
    
    #incase you have a header/title in the first row of your csv file, do the next line else skip it
    data.pop(0) 
    
    q1 = []  
    
    for i in range(len(data)):
      q1.append(int(data[i][your_column_number]))
    
    print ('Mean of your_column_number :            ', (np.mean(q1)))
    
    0 讨论(0)
  • 2020-12-10 20:24

    If you want to do it without stdlib modules for some reason:

    with open('path/to/csv') as infile:
        columns = list(map(float,next(infile).split(',')))
        for how_many_entries, line in enumerate(infile,start=2):
            for (idx,running_avg), new_data in zip(enumerate(columns), line.split(',')):
                columns[idx] += (float(new_data) - running_avg)/how_many_entries
    
    0 讨论(0)
提交回复
热议问题