I\'m new in Python and I\'m trying to get the average of every (column or row) of a csv file for then select the values that are higher than the double of the average of its
I hope this helps you out......Some help....here is what I would do - which is use numpy:
# ==========================
import numpy as np
import csv as csv
# Assume that you have 2 columns and a header-row: The Columns are (1)
# question # ...1; (2) question 2
# ========================================
readdata = csv.reader(open('filename.csv', 'r')) #this is the file you
# ....will write your original file to....============
data = []
for row in readdata:
data.append(row)
Header = data[0]
data.pop(0)
q1 = []
q2 = []
# ========================================
for i in range(len(data)):
q1.append(int(data[i][1]))
q2.append(int(data[i][2]))
# ========================================
# ========================================
# === Means/Variance - Work-up Section ===
# ========================================
print ('Mean - Question-1: ', (np.mean(q1)))
print ('Variance,Question-1: ', (np.var(q1)))
print ('==============================================')
print ('Mean - Question-2: ', (np.mean(q2)))
print ('Variance,Question-2: ', (np.var(q2)))
Here's a clean up of your function, but it probably doesn't do what you want it to do. Currently, it is getting the average of all values in all columns:
def average_column (csv):
f = open(csv,"r")
average = 0
Sum = 0
row_count = 0
for row in f:
for column in row.split(','):
n=float(column)
Sum += n
row_count += 1
average = Sum / len(column)
f.close()
return 'The average is:', average
I would use the csv
module (which makes csv parsing easier), with a Counter object to manage the column totals and a context manager to open the file (no need for a close()
):
import csv
from collections import Counter
def average_column (csv_filepath):
column_totals = Counter()
with open(csv_filepath,"rb") as f:
reader = csv.reader(f)
row_count = 0.0
for row in reader:
for column_idx, column_value in enumerate(row):
try:
n = float(column_value)
column_totals[column_idx] += n
except ValueError:
print "Error -- ({}) Column({}) could not be converted to float!".format(column_value, column_idx)
row_count += 1.0
# row_count is now 1 too many so decrement it back down
row_count -= 1.0
# make sure column index keys are in order
column_indexes = column_totals.keys()
column_indexes.sort()
# calculate per column averages using a list comprehension
averages = [column_totals[idx]/row_count for idx in column_indexes]
return averages
First of all, as people say - CSV format looks simple, but it can be quite nontrivial, especially once strings enter play. monkut already gave you two solutions, the cleaned-up version of your code, and one more that uses CSV library. I'll give yet another option: no libraries, but plenty of idiomatic code to chew on, which gives you averages for all columns at once.
def get_averages(csv):
column_sums = None
with open(csv) as file:
lines = file.readlines()
rows_of_numbers = [map(float, line.split(',')) for line in lines]
sums = map(sum, zip(*rows_of_numbers))
averages = [sum_item / len(lines) for sum_item in sums]
return averages
Things to note: In your code, f
is a file object. You try to close it after you have already returned the value. This code will never be reached: nothing executes after a return
has been processed, unless you have a try...finally
construct, or with
construct (like I am using - which will automatically close the stream).
map(f, l)
, or equivalent [f(x) for x in l]
, creates a new list whose elements are obtained by applying function f
on each element on l
.
f(*l)
will "unpack" the list l
before function invocation, giving to function f
each element as a separate argument.
I suggest breaking this into several smaller steps:
Each of these steps can be implemented as two separate functions. (In a realistic situation where the CSV file is large, reading the complete file into memory might be prohibitive due to space constraints. However, for a learning exercise, this is a great way to gain an understanding of writing your own functions.)
This definitely worked for me!
import numpy as np
import csv
readdata = csv.reader(open('C:\\...\\your_file_name.csv', 'r'))
data = []
for row in readdata:
data.append(row)
#incase you have a header/title in the first row of your csv file, do the next line else skip it
data.pop(0)
q1 = []
for i in range(len(data)):
q1.append(int(data[i][your_column_number]))
print ('Mean of your_column_number : ', (np.mean(q1)))
If you want to do it without stdlib modules for some reason:
with open('path/to/csv') as infile:
columns = list(map(float,next(infile).split(',')))
for how_many_entries, line in enumerate(infile,start=2):
for (idx,running_avg), new_data in zip(enumerate(columns), line.split(',')):
columns[idx] += (float(new_data) - running_avg)/how_many_entries