问题
What I need to do is calculate the following:
The number of times a person appears in the list (column 8) on dates prior to the date specified in the row with the same t data in column 10 and a 1 occurs in column 7.
The number of times a person (column 8) appears in the list on dates prior to the date specified in the row and with the same t data in column 10 (note they are sorted chronologically.)
It might be easier to demonstrate this with an example, raw data from csv.
02/01/2005,Data,Class xpv,4,11yo+,4,1,George Smith,data,15t
02/01/2005,Data,Class xpv,4,11yo+,4,2,Ted James,data,22t
02/01/2005,Data,Class xpv,4,11yo+,4,3,Emma Lilly,data,22t
02/01/2005,Data,Class xpv,4,11yo+,4,5,George Smith,data,25t
02/01/2005,Data,Class tn2,4,10yo+,6,4,Tom Phillips,data,15t
03/01/2005,Data,Class tn2,4,10yo+,6,2,Tom Phillips,data,25t
03/01/2005,Data,Class tn2,4,10yo+,6,5,George Smith,data,22t
03/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,25t
03/01/2005,Data,Class tn2,4,10yo+,6,1,Emma Lilly,data,25t
03/01/2005,Data,Class tn2,4,10yo+,6,6,George Smith,data,15t
04/01/2005,Data,Class tn2,4,10yo+,6,6,Ted James,data,25t
04/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,22t
04/01/2005,Data,Class tn2,4,10yo+,6,2,George Smith,data,22t
04/01/2005,Data,Class tn2,4,10yo+,6,4,George Smith,data,25t
04/01/2005,Data,Class tn2,4,10yo+,6,1,George Smith,data,15t
04/01/2005,Data,Class tn2,4,10yo+,6,5,Tom Phillips,data,25t
05/01/2005,Data,Class 22zn,2,10yo+,5,3,Emma Lilly,data,25t
05/01/2005,Data,Class 22zn,2,10yo+,5,1,Ted James,data,22t
05/01/2005,Data,Class 22zn,2,10yo+,5,2,George Smith,data,22t
05/01/2005,Data,Class 22zn,2,10yo+,5,4,Emma Lilly,data,25t
05/01/2005,Data,Class 22zn,2,10yo+,5,5,Tom Phillips,data,15t
What I need the csv to look like as a result of following the described instructions:
02/01/2005,Data,Class xpv,4,11yo+,4,1,George Smith,data,15t,0,0
02/01/2005,Data,Class xpv,4,11yo+,4,2,Ted James,data,22t,0,0
02/01/2005,Data,Class xpv,4,11yo+,4,3,Emma Lilly,data,22t,0,0
02/01/2005,Data,Class xpv,4,11yo+,4,5,George Smith,data,25t,0,0
02/01/2005,Data,Class tn2,4,10yo+,6,4,Tom Phillips,data,15t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,2,Tom Phillips,data,25t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,5,George Smith,data,22t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,25t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,1,Emma Lilly,data,25t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,6,George Smith,data,15t,1,1
04/01/2005,Data,Class tn2,4,10yo+,6,6,Ted James,data,25t,0,0
04/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,22t,0,0
04/01/2005,Data,Class tn2,4,10yo+,6,2,George Smith,data,22t,0,1
04/01/2005,Data,Class tn2,4,10yo+,6,4,George Smith,data,25t,0,1
04/01/2005,Data,Class tn2,4,10yo+,6,1,George Smith,data,15t,1,2
04/01/2005,Data,Class tn2,4,10yo+,6,5,Tom Phillips,data,25t,0,2
05/01/2005,Data,Class 22zn,2,10yo+,5,3,Emma Lilly,data,25t,1,1
05/01/2005,Data,Class 22zn,2,10yo+,5,1,Ted James,data,22t,0,1
05/01/2005,Data,Class 22zn,2,10yo+,5,2,George Smith,data,22t,0,2
05/01/2005,Data,Class 22zn,2,10yo+,5,4,Emma Lilly,data,25t,1,1
05/01/2005,Data,Class 22zn,2,10yo+,5,5,Tom Phillips,data,15t,0,1
So you can see that on the last row Tom Phillips with 15t had occurred 1 times on days previous to this one (column 10) and of those 1 occurrences there had been zero occurrences of column 7 being "1".
My csv data is obviously much larger than this, so efficient techniques and suggestions would also be appreciated. If more clarification is required please say so, its hard to tell if this example is understandable.
Kind regards AEA
回答1:
very minor change:
import csv
import datetime
import copy
from collections import defaultdict
with open(r"C:\Temp\test2.csv") as i, open(r"C:\Temp\results2.csv", "wb") as o:
rdr = csv.reader(i)
wrt = csv.writer(o)
# data is a dictionary where we will save current and previous data like:
# {
# (George Smith, 15t): [
# previous date count when column 7 = 1,
# previous date count,
# current date count when column 7 = 1,
# current date count
# ]
data, currdate = defaultdict(lambda:[0, 0, 0, 0]), None
for line in rdr:
date = datetime.datetime.strptime(line[0], '%d/%m/%Y')
# key of dictionary would be tuple looking like
# (George Smith, 15t)
name = (line[7], line[9])
# if date is changed, we have to put current values into previous
# by copying part of the list
#
# (George Smith, 15t): [
# previous date count when column 7 = 1,
# previous date count,
# current date count when column 7 = 1,
# current date count
# ]
#
# becomes
#
# (George Smith, 15t): [
# current date count when column 7 = 1,
# current date count
# current date count when column 7 = 1,
# current date count
# ]
# and then we change currdate variable to current one
if date != currdate or not currdate:
for v in data.itervalues(): v[:2] = v[2:]
currdate = date
# writing current line + first 2 elements from list (previous counts)
wrt.writerow(line + data[name][:2])
# updating current counts
data[name][3] += 1
if line[6] == "1": data[name][2] += 1
来源:https://stackoverflow.com/questions/19123871/calculating-number-of-occurrences-of-dual-parameter-match-in-a-csv-appending-the