How to make Default Choice for np.select() a Previous Value of an Array, Series, or DataFrame

问题

I am using np.select() to construct an ndarray with values of either 1, -1, or 0, depending on some conditions. It is possible that none of these will be met, so I need a default value. I would like this value to be the value that the array holds in the previous index, if that makes sense. My naive code, which runs on some columns of a DataFrame named "total" and which raises an error, is below:

condlist = [total.ratios > total.s_entry, total.ratios < total.b_entry, (total.ratios > total.b_entry) & (total.ratios < total.s_entry)]
choicelist = (-1, 1, 0)
pos1 = pd.Series(np.select(condlist, choicelist, pos1))

Is there a way to do what I am asking? For example, having the array start

and then the sixth element doesn't satisfy any of the conditions, so its value defaults to -1 due to that being the most recent value of the array?

回答1:

I am not sure if you will be happy with this solution but you could assign some default value and then change it to what you want while iterating:

x = np.arange(20)

condlist = [x < 4, np.logical_and(x > 8, x < 15), x > 15, True]
choicelist = (-1, 1, 0, None)
pos1 = pd.Series(np.select(condlist, choicelist, x))

for index, row in pos1.items():
    if row == None and index == 0:
        pass # Not sure what you want to do here
    elif row == None:
        pos1.at[index] = pos1.at[index-1]

回答2:

Try leaving None as the default value in np.select

Then you can fill them using .fillna() method which accepts pd.Series as an argument for index-wise filling.

In your case the argument is the same series with shifted index (it can be done using deque .rotate() method). Hope this works for you:

from collections import deque

condlist = [total.ratios > total.s_entry, total.ratios < total.b_entry, (total.ratios > total.b_entry) & (total.ratios < total.s_entry)]
choicelist = (-1, 1, 0)

pos1 = pd.Series(np.select(condlist, choicelist, None))

pos1_index_shift = deque(pos1.index) # [0, 1, 2, ...]
pos1_index_shift.rotate(1) # [n, 0, 1, ...] - done inplace

pos1_prev = pos1.copy()
pos1_prev.index = pos1_index_shift

pos1 = pos1.fillna(pos1_prev)

回答3:

I had the same problem but didn't want to go through a complicated mechanism just for the trouble of the default value (given I had already a working version using .loc instead) as seen in the responses here.

I simply tried passing the dataframe column/series as the default to keep that value, when it was already populated in my case, and it worked:

    # e.g. if task_type ~= nan then it already has a value of "C" 
    # that I want to keep

    conditions = [
        result_df["task_type"].isna() & result_df["maintenance_task"],
        result_df["task_type"].isna(),
    ]

    choices = ["A", "B"]

    result_df["task_type"] = np.select(conditions, choices, default=result_df["task_type"])

I noticed this approach was slightly more performant than the one I had working with .loc and it would scale/read better in the long run if more conditions appear.

来源：https://stackoverflow.com/questions/61990742/how-to-make-default-choice-for-np-select-a-previous-value-of-an-array-series

标签

python

pandas

numpy

numpy-ndarray