问题
I need to resample some data with numpys weighted-average-function - and it just doesn't work... .
This is my test-case:
import numpy as np
import pandas as pd
time_vec = [datetime.datetime(2007,1,1,0,0)
,datetime.datetime(2007,1,1,0,1)
,datetime.datetime(2007,1,1,0,5)
,datetime.datetime(2007,1,1,0,8)
,datetime.datetime(2007,1,1,0,10)
]
df = pd.DataFrame([2,3,1,7,4],index = time_vec)
A normal resampling without weights works fine (using the lambda function as a parameter to how
is suggested here: Pandas resampling using numpy percentile? Thanks!):
df.resample('5min',how = lambda x: np.average(x[0]))
But if i try to use some weights, it always returns a TypeError: Axis must be specified when shapes of a and weights differ
:
df.resample('5min',how = lambda x: np.average(x[0],weights = [1,2,3,4,5]))
I tried this with many different numbers of weights, but it did not get better:
for i in xrange(20):
try:
print range(i)
print df.resample('5min',how = lambda x:np.average(x[0],weights = range(i)))
print i
break
except TypeError:
print i,'typeError'
I'd be glad about any suggestions.
回答1:
The short answer here is that the weights in your lambda
need to be created dynamically based on the length of the series that is being averaged. In addition, you need to be careful about the types of objects that you're manipulating.
The code that I got to compute what I think you're trying to do is as follows:
df.resample('5min', how=lambda x: np.average(x, weights=1+np.arange(len(x))))
There are two differences compared with the line that was giving you problems:
x[0]
is now justx
. Thex
object in thelambda
is apd.Series
, and sox[0]
gives just the first value in the series. This was working without raising an exception in the first example (without the weights) becausenp.average(c)
just returnsc
whenc
is a scalar. But I think it was actually computing incorrect averages even in that case, because each of the sampled subsets was just returning its first value as the "average".The weights are created dynamically based on the length of data in the
Series
being resampled. You need to do this because thex
in yourlambda
might be aSeries
of different length for each time interval being computed.
The way I figured this out was through some simple type debugging, by replacing the lambda
with a proper function definition:
def avg(x):
print(type(x), x.shape, type(x[0]))
return np.average(x, weights=np.arange(1, 1+len(x)))
df.resample('5Min', how=avg)
This let me have a look at what was happening with the x
variable. Hope that helps!
来源:https://stackoverflow.com/questions/26370831/use-numpy-average-with-weights-for-resampling-a-pandas-array