Grouping variables based on conditions

问题

Grouping the following data in 64 groups. I have two variables x and y for each object. I would like to group them up based on a condition. Both x and y have a range between 0 and 2000 and I want to break them into 64 groups. The first one to have x<250 and y<250 the next one 250

Sample data:
index x y
1     10 100
2     270 60
3     550 1000
4     658 1900
5     364 810 
6     74  1890
...
6000  64  71

Could you please tell me a way to do it? I have my data now as a data frame but I do not know if it the way to go. I was told by some colleagues to avoid using loops in data frames. I attached also a picture of how my scatterplot looks like, it could be helpful to visualize my data for you. Thank you in advance!

回答1:

Use pd.cut() to bin your variables to x- and y-categories and then construct their group according to some logic (depending on if you want a specific order, my code below simply orders the cells from bottom to top and left to right)

bins = [250 * i for i in range(9)]
labels = list(range(8))
df['x_bin'] = pd.cut(df['x'], bins, labels=labels)
df['y_bin'] = pd.cut(df['y'], bins, labels=labels)
df['group'] = df['x_bin'].astype(np.int8) + df['y_bin'].astype(np.int8).multiply(8)

Note that the .astype(np.int8)-calls are a workaround to allow for basic math with pandas.Series. If you don't want to store the intermediate binning assignments, all of this could be done in one line by substituting the column references in my last line for the assignments in the prior lines:

df['group'] = pd.cut(df['x'], bins, labels=labels).astype(np.int8) + pd.cut(df['y'], bins, labels=labels).astype(np.int8).multiply(8)

来源：https://stackoverflow.com/questions/58859869/grouping-variables-based-on-conditions

标签

python

loops

dataframe

group-by

conditional-statements