Suppose that I have a structured array of students (strings) and test scores (ints), where each entry is the score that a specific student received on a specific test. Each
A little bit faster and simpler solution based on itertools
, without using view(), is
[(k,e['score'][list(g)].mean()) for k, g in groupby(argsort(e),e['student'].__getitem__ )]
This is the same idea of ecatmur, but works in terms of indices employing argsort() instead of sort.
NumPy isn't designed to be able to group rows together and apply aggregate functions to those groups. You could:
Here's the itertools
solution, but as you can see it's quite complicated and inefficient. I'd recommend one of the other two methods.
np.array([(k, np.array(list(g), dtype=grades.dtype).view(np.recarray)['score'].mean())
for k, g in groupby(np.sort(grades, order='student').view(np.recarray),
itemgetter('student'))], dtype=grades.dtype)
collapseByField(grades,'student') gives what you want, after:
def collapseByField(e,collapsefield,keepFields=None,agg=None):
import numpy as np
assert isinstance(e,np.ndarray) # Structured array
if agg is None:
agg=np.mean
if keepFields is None:
newf=[(n,agg,n) for n in e.dtype.names if n not in (collapsefield)]
import matplotlib as mpl
return(mpl.mlab.rec_groupby(e,[collapsefield],newf))
matplotlib.mlab.rec_groupby was exactly what I was looking for.