subsetting a Python DataFrame

喜你入骨 提交于 2019-11-28 15:57:56
Phillip Cloud

I'll assume that Time and Product are columns in a DataFrame, df is an instance of DataFrame, and that other variables are scalar values:

For now, you'll have to reference the DataFrame instance:

k1 = df.loc[(df.Product == p_id) & (df.Time >= start_time) & (df.Time < end_time), ['Time', 'Product']]

The parentheses are also necessary, because of the precedence of the & operator vs. the comparison operators. The & operator is actually an overloaded bitwise operator which has the same precedence as arithmetic operators which in turn have a higher precedence than comparison operators.

In pandas 0.13 a new experimental DataFrame.query() method will be available. It's extremely similar to subset modulo the select argument:

With query() you'd do it like this:

df[['Time', 'Product']].query('Product == p_id and Month < mn and Year == yr')

Here's a simple example:

In [9]: df = DataFrame({'gender': np.random.choice(['m', 'f'], size=10), 'price': poisson(100, size=10)})

In [10]: df
Out[10]:
  gender  price
0      m     89
1      f    123
2      f    100
3      m    104
4      m     98
5      m    103
6      f    100
7      f    109
8      f     95
9      m     87

In [11]: df.query('gender == "m" and price < 100')
Out[11]:
  gender  price
0      m     89
4      m     98
9      m     87

The final query that you're interested will even be able to take advantage of chained comparisons, like this:

k1 = df[['Time', 'Product']].query('Product == p_id and start_time <= Time < end_time')
sernle

Just for someone looking for a solution more similar to R:

df[(df.Product == p_id) & (df.Time> start_time) & (df.Time < end_time)][['Time','Product']]

No need for data.loc or query, but I do think it is a bit long.

I've found that you can use any subset condition for a given column by wrapping it in []. For instance, you have a df with columns ['Product','Time', 'Year', 'Color']

And let's say you want to include products made before 2014. You could write,

df[df['Year'] < 2014]

To return all the rows where this is the case. You can add different conditions.

df[df['Year'] < 2014][df['Color' == 'Red']

Then just choose the columns you want as directed above. For instance, the product color and key for the df above,

df[df['Year'] < 2014][df['Color' == 'Red'][['Product','Color']]
Santosh Vutukuri

Creating an Empty Dataframe with known Column Name:

Names = ['Col1','ActivityID','TransactionID']
df = pd.DataFrame(columns = Names)

Creating a dataframe from csv:

df = pd.DataFrame('...../file_name.csv')

Creating a dynamic filter to subset a dtaframe:

i = 12
df[df['ActivitiID'] <= i]

Creating a dynamic filter to subset required columns of dtaframe

df[df['ActivityID'] == i][['TransactionID','ActivityID']]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!