Comparing single dataframe value to previous 10 in same column

醉酒当歌 提交于 2019-12-13 14:03:09

问题


In a dataframe, I would like to count how many of the prices from the previous 10 days are greater than today's price. Result would look like this:

price   ct>prev10
50.00   
51.00   
52.00   
50.50   
51.00   
50.00   
50.50   
53.00   
52.00   
49.00   
51.00   3

I have seen this post answered by DSM, but the requirement was different in that the base for comparison was a static number as opposed to the current row:

Achieving "countif" with pd.rolling_sum()

Of course I would like to do this without looping through 1x1. Pretty much stumped - thanks in advance for any advise.


回答1:


You can use a rolling_apply function on the series. I used a window length of 5 given the small size of the sample data, but you can easily change it.

The lambda function counts the number of items in the rolling group (excluding the last item) is greater than the last item.

df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

window = 5  # Given that sample data only contains 11 values.
df['price_count'] = pd.rolling_apply(df.price, window, 
                                     lambda group: sum(group[:-1] > group[-1]))
>>> df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2

In the example above, the first group is the prices with index values 0-4. You can see what is happening with:

group = df.price[:window].values
>>> group
array([ 50. ,  51. ,  52. ,  50.5,  51. ])

Now, do your comparison of the previous four prices to the current price:

>>> group[:-1] > group[-1]
array([False, False,  True, False], dtype=bool)

Then, you are just summing the boolean values:

>>> sum(group[:-1] > group[-1])
1

This is the value that gets put into the first closing window at index 4.




回答2:


Here's a vectoized approach with NumPy module that supports broadcasting for implementing vectorized methods -

import numpy as np
import pandas as pd

# Sample input dataframe
df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

# Convert to numpy array for counting purposes
A = np.array(df['price'])

W = 5 # Window size

# Initialize another column for storing counts
df['price_count'] = np.nan

# Get counts and store as a new column in dataframe
C = (A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
df['price_count'][W-1:] = C

Sample run -

>>> df
    price
0    50.0
1    51.0
2    52.0
3    50.5
4    51.0
5    50.0
6    50.5
7    53.0
8    52.0
9    49.0
10   51.0
>>> A = np.array(df['price'])
>>> W = 5 # Window size
>>> df['price_count'] = np.nan
>>> 
>>> C=(A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
>>> df['price_count'][W-1:] = C
>>> df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2


来源:https://stackoverflow.com/questions/33289664/comparing-single-dataframe-value-to-previous-10-in-same-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!