Actual and Percentage Difference on consecutive columns in a Pandas or Pyspark Dataframe

你。 提交于 2019-12-12 04:43:40

问题


I would like to perform two different calculations across consecutive columns in a pandas or pyspark dataframe.

Columns are weeks and the metrics are displayed as rows. I want to calculate the actual and percentage differences across the columns.

The input/output tables incl. the calculations used in Excel are displayed in the following image. I want to replicate these calculations on a pandas or pyspark dataframe.

Raw Data Attached:

Metrics         Week20  Week21  Week22  Week23  Week24  Week25  Week26  Week27
Sales           20301   21132   20059   23062   19610   22734   22140   20699
TRXs            739     729     690     779     701     736     762     655
Attachment Rate 4.47    4.44    4.28    4.56    4.41    4.58    4.55    4.96
AOV             27.47   28.99   29.07   29.6    27.97   30.89   29.06   31.6
Profit          5177    5389    5115    5881    5001    5797    5646    5278
Profit per TRX  7.01    7.39    7.41    7.55    7.13    7.88    7.41    8.06

回答1:


in pandas you could use pct_change(axis=1) and diff(axis=1) methods:

df = df.set_index('Metrics')

# list of metrics with "actual diff"
actual = ['AOV', 'Attachment Rate']

rep = (df[~df.index.isin(actual)].pct_change(axis=1).round(2)*100).fillna(0).astype(str).add('%')
rep = pd.concat([rep,
                 df[df.index.isin(actual)].diff(axis=1).fillna(0)
                ])


In [131]: rep
Out[131]:
                Week20 Week21 Week22 Week23  Week24 Week25 Week26  Week27
Metrics
Sales             0.0%   4.0%  -5.0%  15.0%  -15.0%  16.0%  -3.0%   -7.0%
TRXs              0.0%  -1.0%  -5.0%  13.0%  -10.0%   5.0%   4.0%  -14.0%
Profit            0.0%   4.0%  -5.0%  15.0%  -15.0%  16.0%  -3.0%   -7.0%
Profit per TRX    0.0%   5.0%   0.0%   2.0%   -6.0%  11.0%  -6.0%    9.0%
Attachment Rate      0  -0.03  -0.16   0.28   -0.15   0.17  -0.03    0.41
AOV                  0   1.52   0.08   0.53   -1.63   2.92  -1.83    2.54


来源:https://stackoverflow.com/questions/38484042/actual-and-percentage-difference-on-consecutive-columns-in-a-pandas-or-pyspark-d

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!