问题
ykp.data
Out[182]:
state action reward
0 [41] 5 59
1 [5] 52 48
2 [46] 35 59
3 [42] 16 12
4 [43] 37 48
5 [36] 5 59
6 [49] 52 48
7 [39] 11 23
I would like to find the row that matches [42] in the state entry so I ran
ykp.data.query('state == [42]')
but I get
Empty DataFrame
Columns: [state, action, reward]
Index: []
when I should be seeing [42], 16, 12
.
Can someone please tell me how I can workaround this? I need my state-values to be stored as arrays.
回答1:
Best to avoid pd.Series.apply
here. Instead, you can use itertools.chain
to construct a regular NumPy array. Then compare the array to an integer to form a Boolean array for indexing:
from itertools import chain
df = pd.DataFrame(np.random.randint(0, 100, size=(100000, 1)), columns=['state'])
df = df.assign(state=df.state.apply(lambda x: [x]), axis=1)
def wen(df):
df.state=df.state.astype(str)
return df.query("state == '[42]'")
%timeit df[np.array(list(chain.from_iterable(df['state'].values))) == 42] # 14.2 ms
%timeit df[df.state.apply(tuple) == (42,)] # 41.9 ms
%timeit df.loc[df.state.apply(lambda x: x==[42])] # 33.9 ms
%timeit wen(df) # 19.9 ms
Better still, don't use lists in your dataframe. Just use regular int
series. This will be memory and performance efficient.
回答2:
You can adding astype(str)
df.state=df.state.astype(str)
df.query("state == '[42]'")
Out[290]:
state action reward
3 [42] 16 12
回答3:
print df[df.state.apply(tuple) == (42,)]
state action reward
3 [42] 16 12
Another solution (from the @user3483203 comment below):
df.loc[df.state.apply(lambda x: x==[42])]
But the original is 14% faster:
df = pd.DataFrame(np.random.randint(0, 100, size=(100000, 1)), columns=['state'])
df = df.assign(state=df.state.apply(lambda x: [x]), axis=1)
%timeit df[df.state.apply(tuple) == (42,)]
10 loops, best of 3: 24.8 ms per loop
%timeit df.loc[df.state.apply(lambda x: x==[42])]
10 loops, best of 3: 28.8 ms per loop
来源:https://stackoverflow.com/questions/51488681/pandas-query-with-a-column-consisting-of-array-entries