Subtract consecutive columns in a Pandas or Pyspark Dataframe

北战南征 提交于 2021-02-04 15:51:45

问题


I would like to perform the following operation in a pandas or pyspark dataframe but i still havent found a solution.

I want to subtract the values from consecutive columns in a dataframe.

The operation I am describing can be seen in the image below.

Bear in mind that the output dataframe wont have any values on first column as the first column in the input table cannot be subtracted by its previous one as it doesn't exist.


回答1:


diff has an axis param so you can just do this in one step:

In [63]:
df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df

Out[63]:
             A         B         C         D
row1  0.146855  0.250781  0.766990  0.756016
row2  0.528201  0.446637  0.576045  0.576907
row3  0.308577  0.592271  0.553752  0.512420

In [64]:
df.diff(axis=1)

Out[64]:
       A         B         C         D
row1 NaN  0.103926  0.516209 -0.010975
row2 NaN -0.081564  0.129408  0.000862
row3 NaN  0.283694 -0.038520 -0.041331



回答2:


df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df.T.diff().T



来源:https://stackoverflow.com/questions/38321427/subtract-consecutive-columns-in-a-pandas-or-pyspark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!