I have been following a similar answer here, but I have some questions when using sklearn and rolling apply. I am trying to create z-scores and do PCA with rolling apply, bu
As @BrenBarn commented, the rolling function needs to reduce a vector to a single number. The following is equivalent to what you were trying to do and help's highlight the problem.
zscore = lambda x: (x - x.mean()) / x.std()
tmp.rolling(5).apply(zscore)
TypeError: only length-1 arrays can be converted to Python scalars
In the zscore
function, x.mean()
reduces, x.std()
reduces, but x
is an array. Thus the entire thing is an array.
The way around this is to perform the roll on the parts of the z-score calculation that require it, and not on the parts that cause the problem.
(tmp - tmp.rolling(5).mean()) / tmp.rolling(5).std()
Since x in lambda function represents a (rolling) series/ndarray, the lambda function can be coded like this (where x[-1] refers to current rolling data point):
zscore = lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
Then it is OK to call:
tmp.rolling(5).apply(zscore)
Also noted that the degree of freedom defaults to 1 in tmp.rolling(5).std()
In order to generate the same results as @piRSquared's, one has to specify the ddof for x.std()
, which defaults to 0. --It took quite a while to figure this out!