Compute percentage for each row in pandas dataframe

匿名 (未验证) 提交于 2019-12-03 01:34:02

问题:

                  country_name  country_code  val_code  \    United States of America           231                     1       United States of America           231                     2       United States of America           231                     3       United States of America           231                     4       United States of America           231                     5           y191      y192      y193      y194      y195  \    47052179  43361966  42736682  43196916  41751928       1187385   1201557   1172941   1176366   1192173       28211467  27668273  29742374  27543836  28104317       179000    193000    233338    276639    249688       12613922  12864425  13240395  14106139  15642337  

In the data frame above, I would like to compute for each row, the percentage of the total occupied by that val_code, resulting in foll. data frame.

I.e. Sum up each row and divide by total of all rows

                  country_name  country_code  val_code  \    United States of America           231                     1       United States of America           231                     2       United States of America           231                     3       United States of America           231                     4       United States of America           231                     5          perc      50.14947129   1.363631254   32.48344744   0.260213146   15.74323688 

Right now, I am doing this, but it is not working

grp_df = df.groupby(['country_name', 'val_code']).agg()  pct_df = grp_df.groupby(level=0).apply(lambda x: 100*x/float(x.sum())) 

回答1:

Ge the total for all the columns of interest and then add the percentage column:

In [35]: total = np.sum(df.ix[:,'y191':].values) df['percent'] = df.ix[:,'y191':].sum(axis=1)/total * 100 df  Out[35]:                country_name  country_code  val_code      y191      y192  \ 0  United States of America           231         1  47052179  43361966    1  United States of America           231         1   1187385   1201557    2  United States of America           231         1  28211467  27668273    3  United States of America           231         1    179000    193000    4  United States of America           231         1  12613922  12864425            y193      y194      y195    percent   0  42736682  43196916  41751928  50.149471   1   1172941   1176366   1192173   1.363631   2  29742374  27543836  28104317  32.483447   3    233338    276639    249688   0.260213   4  13240395  14106139  15642337  15.743237   

So np.sum will sum all the values:

In [32]: total = np.sum(df.ix[:,'y191':].values) total  Out[32]: 434899243 

We then call .sum(axis=1)/total * 100 on the cols of interest to sum row-wise, divide by the total and multiply by 100 to get a percentage.



回答2:

You can get the percentages of each column using a lambda function as follows:

>>> df.iloc[:, 3:].apply(lambda x: x / x.sum())        y191      y192      y193      y194      y195 0  0.527231  0.508411  0.490517  0.500544  0.480236 1  0.013305  0.014088  0.013463  0.013631  0.013713 2  0.316116  0.324405  0.341373  0.319164  0.323259 3  0.002006  0.002263  0.002678  0.003206  0.002872 4  0.141342  0.150833  0.151969  0.163455  0.179920 

Your example does not have any duplicate values for val_code, so I'm unsure how you want your data to appear (i.e. show percent of total in column vs. total for each vval_code group.)



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!