问题
I recently asked a question which was answered - How do I add conditionally to a selection of cells in a pandas dataframe column when the the column is a series of lists?, but I believe have a new problem which I had not previously considered.
In the following dataframe I need two conditions to result in a change to column d
. Each value in column d
is a list
.
- Where
a == b
, the final integer in d is incremented by one. Where
a != b
, the list of integers is extended and the value1
is inserted at the end of thelist
in columnd
.a b c d On On [0] [0,3] On Off [0] [0,1] On On [0] [2] On On [0] [0,4,4] On Off [0] [0]
As a result, the dataframe would like this:
a b c d On On [0] [0,4] On Off [0] [0,1,1] On On [0] [3] On On [0] [0,4,5] On Off [0] [0,1]
I realise that this can be done using pd.Series.apply
method in conjunction with a predefined function or use of lambda
however the data frame consists of 100000 rows and I was hoping that a vectorized solution to these two conditions may exist.
回答1:
As Edchum says, vecorised solution can be problematic.
One non vectorized solution with apply custom functions
:
df['e'] = df['d']
def exten(lst):
return lst + [1]
def incre(lst):
lst[-1] = lst[-1] + 1
return lst
df.loc[df.a != df.b, 'd'] = df.e.apply(exten)
df.loc[df.a == df.b, 'd'] = df.e.apply(incre)
df = df.drop('e', axis=1)
print df
a b c d
0 On On [0] [0, 4]
1 On Off [0] [0, 1, 1]
2 On On [0] [3]
3 On On [0] [0, 4, 5]
4 On Off [0] [0, 1]
来源:https://stackoverflow.com/questions/35287780/vectorized-solution-to-conditional-dataframe-selection