I try to manipulate a large CSV file using Pandas, when I wrote this
df = pd.read_csv(strFileName,sep=\'\\t\',delimiter=\'\\t\')
it raises
Based on your snippet in out of memory error when reading csv file in chunk, when reading line-by-line.
I assume that kb_2 is the error indicator,
groups={}
with open("data/petaJoined.csv", "r") as large_file:
for line in large_file:
arr=line.split('\t')
#assuming this structure: ka,kb_1,kb_2,timeofEvent,timeInterval
k=arr[0]+','+arr[1]
if not (k in groups.keys())
groups[k]={'record_count':0, 'error_sum': 0}
groups[k]['record_count']=groups[k]['record_count']+1
groups[k]['error_sum']=groups[k]['error_sum']+float(arr[2])
for k,v in groups.items:
print ('{group}: {error_rate}'.format(group=k,error_rate=v['error_sum']/v['record_count']))
This code snippet stores all the groups in a dictionary, and calculates the error rate after reading the entire file.
It will encounter an out-of-memory exception, if there are too many combinations of groups.