Python: Binning based on 2 columns in Pandas

末鹿安然 提交于 2021-02-05 06:11:05

问题


Looking for a quick and elegant way to bin based on 2 columns in Pandas.

Here's my data frame

                              filename  height   width
0        shopfronts_23092017_3_285.jpg   750.0   560.0
1                   shopfronts_200.jpg   4395.0  6020.0
2  shopfronts_25092017_eateries_98.jpg   414.0   621.0
3                   shopfronts_101.jpg   480.0   640.0
4                   shopfronts_138.jpg   3733.0  8498.0
5  shopfronts_25092017_eateries_95.jpg   187.0   250.0
6      shopfronts_25092017_neon_33.jpg   100.0   200.0
7                   shopfronts_322.jpg   682.0  1024.0
8                   shopfronts_171.jpg   800.0   600.0
9         shopfronts_23092017_3_35.jpg   120.0   210.0

I need to bin the records based on 2 columns height & width (image resolutions)

I'm looking for something like this

                              filename  height   width    group
0        shopfronts_23092017_3_285.jpg   750.0   560.0       g3 
1                   shopfronts_200.jpg   4395.0  6020.0      g4  
2  shopfronts_25092017_eateries_98.jpg   414.0   621.0   others
3                   shopfronts_101.jpg   480.0   640.0   others
4                   shopfronts_138.jpg   3733.0  8498.0      g4
5  shopfronts_25092017_eateries_95.jpg   187.0   250.0       g1
6      shopfronts_25092017_neon_33.jpg   100.0   200.0       g1
7                   shopfronts_322.jpg   682.0  1024.0   others
8                   shopfronts_171.jpg   800.0   600.0       g3
9         shopfronts_23092017_3_35.jpg   120.0   210.0       g1

where 

g1: <= 400x300]
g2: (400x300, 640x480]
g3: (640x480, 800x600]
g4: > 800x600
others: If they don't comply to the requirement (Ex: records 7,2,3 - either height or width will fall in the categories defined but not both)

Looking to get the frequency count using group column. If this is not the best way to go about it and if there is a better way, kindly let me know.


回答1:


Using np.where

In [4510]: df['group'] = np.where((df.height <= 400) & (df.width <= 300),
      ...:          'g1',
      ...:          np.where((df.height <= 640) & (df.width <= 480),
      ...:          'g2',
      ...:          np.where((df.height <= 800) & (df.width <= 600),
      ...:          'g3',
      ...:          np.where((df.height > 800) & (df.width > 600),
      ...:          'g4',
      ...:          'others'))))

In [4511]: df
Out[4511]:
                              filename  height   width   group
0        shopfronts_23092017_3_285.jpg   750.0   560.0      g3
1                   shopfronts_200.jpg  4395.0  6020.0      g4
2  shopfronts_25092017_eateries_98.jpg   414.0   621.0  others
3                   shopfronts_101.jpg   480.0   640.0  others
4                   shopfronts_138.jpg  3733.0  8498.0      g4
5  shopfronts_25092017_eateries_95.jpg   187.0   250.0      g1
6      shopfronts_25092017_neon_33.jpg   100.0   200.0      g1
7                   shopfronts_322.jpg   682.0  1024.0  others
8                   shopfronts_171.jpg   800.0   600.0      g3
9         shopfronts_23092017_3_35.jpg   120.0   210.0      g1



回答2:


You can use dual pd.cut i.e

bins = [0,400,640,800,np.inf]
df['group'] = pd.cut(df['height'].values, bins,labels=["g1","g2","g3",'g4'])

nbin = [0,300,480,600,np.inf]
t = pd.cut(df['width'].values, nbin,labels=["g1","g2","g3",'g4'])

df['group'] =np.where(df['group'] == t,df['group'],'others')
                              filename  height   width  group
0        shopfronts_23092017_3_285.jpg   750.0   560.0      g3
1                   shopfronts_200.jpg  4395.0  6020.0      g4
2  shopfronts_25092017_eateries_98.jpg   414.0   621.0  others
3                   shopfronts_101.jpg   480.0   640.0  others
4                   shopfronts_138.jpg  3733.0  8498.0      g4
5  shopfronts_25092017_eateries_95.jpg   187.0   250.0      g1
6      shopfronts_25092017_neon_33.jpg   100.0   200.0      g1
7                   shopfronts_322.jpg   682.0  1024.0  others
8                   shopfronts_171.jpg   800.0   600.0      g3
9         shopfronts_23092017_3_35.jpg   120.0   210.0      g1


来源:https://stackoverflow.com/questions/46472809/python-binning-based-on-2-columns-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!