问题
In a dataframe, I would like to count how many of the prices from the previous 10 days are greater than today's price. Result would look like this:
price ct>prev10
50.00
51.00
52.00
50.50
51.00
50.00
50.50
53.00
52.00
49.00
51.00 3
I have seen this post answered by DSM, but the requirement was different in that the base for comparison was a static number as opposed to the current row:
Achieving "countif" with pd.rolling_sum()
Of course I would like to do this without looping through 1x1. Pretty much stumped - thanks in advance for any advise.
回答1:
You can use a rolling_apply function on the series. I used a window length of 5 given the small size of the sample data, but you can easily change it.
The lambda function counts the number of items in the rolling group (excluding the last item) is greater than the last item.
df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})
window = 5 # Given that sample data only contains 11 values.
df['price_count'] = pd.rolling_apply(df.price, window,
lambda group: sum(group[:-1] > group[-1]))
>>> df
price price_count
0 50.0 NaN
1 51.0 NaN
2 52.0 NaN
3 50.5 NaN
4 51.0 1
5 50.0 4
6 50.5 2
7 53.0 0
8 52.0 1
9 49.0 4
10 51.0 2
In the example above, the first group is the prices with index values 0-4. You can see what is happening with:
group = df.price[:window].values
>>> group
array([ 50. , 51. , 52. , 50.5, 51. ])
Now, do your comparison of the previous four prices to the current price:
>>> group[:-1] > group[-1]
array([False, False, True, False], dtype=bool)
Then, you are just summing the boolean values:
>>> sum(group[:-1] > group[-1])
1
This is the value that gets put into the first closing window at index 4.
回答2:
Here's a vectoized approach with NumPy module that supports broadcasting for implementing vectorized methods -
import numpy as np
import pandas as pd
# Sample input dataframe
df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})
# Convert to numpy array for counting purposes
A = np.array(df['price'])
W = 5 # Window size
# Initialize another column for storing counts
df['price_count'] = np.nan
# Get counts and store as a new column in dataframe
C = (A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
df['price_count'][W-1:] = C
Sample run -
>>> df
price
0 50.0
1 51.0
2 52.0
3 50.5
4 51.0
5 50.0
6 50.5
7 53.0
8 52.0
9 49.0
10 51.0
>>> A = np.array(df['price'])
>>> W = 5 # Window size
>>> df['price_count'] = np.nan
>>>
>>> C=(A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
>>> df['price_count'][W-1:] = C
>>> df
price price_count
0 50.0 NaN
1 51.0 NaN
2 52.0 NaN
3 50.5 NaN
4 51.0 1
5 50.0 4
6 50.5 2
7 53.0 0
8 52.0 1
9 49.0 4
10 51.0 2
来源:https://stackoverflow.com/questions/33289664/comparing-single-dataframe-value-to-previous-10-in-same-column