Grouping variables based on conditions

落爺英雄遲暮 提交于 2019-12-11 16:55:32

问题


Grouping the following data in 64 groups. I have two variables x and y for each object. I would like to group them up based on a condition. Both x and y have a range between 0 and 2000 and I want to break them into 64 groups. The first one to have x<250 and y<250 the next one 250

Sample data:
index x y
1     10 100
2     270 60
3     550 1000
4     658 1900
5     364 810 
6     74  1890
...
6000  64  71

Could you please tell me a way to do it? I have my data now as a data frame but I do not know if it the way to go. I was told by some colleagues to avoid using loops in data frames. I attached also a picture of how my scatterplot looks like, it could be helpful to visualize my data for you. Thank you in advance!


回答1:


Use pd.cut() to bin your variables to x- and y-categories and then construct their group according to some logic (depending on if you want a specific order, my code below simply orders the cells from bottom to top and left to right)

bins = [250 * i for i in range(9)]
labels = list(range(8))
df['x_bin'] = pd.cut(df['x'], bins, labels=labels)
df['y_bin'] = pd.cut(df['y'], bins, labels=labels)
df['group'] = df['x_bin'].astype(np.int8) + df['y_bin'].astype(np.int8).multiply(8)

Note that the .astype(np.int8)-calls are a workaround to allow for basic math with pandas.Series. If you don't want to store the intermediate binning assignments, all of this could be done in one line by substituting the column references in my last line for the assignments in the prior lines:

df['group'] = pd.cut(df['x'], bins, labels=labels).astype(np.int8) + pd.cut(df['y'], bins, labels=labels).astype(np.int8).multiply(8)


来源:https://stackoverflow.com/questions/58859869/grouping-variables-based-on-conditions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!