How to find maximum value of a column in python dataframe

依然范特西╮ 提交于 2020-06-09 17:58:17

问题


I have a data frame in pyspark. In this data frame I have column called id that is unique.

Now I want to find the maximum value of the column id in the data frame.

I have tried like below

df['id'].max()

But got below error

TypeError: 'Column' object is not callable

Please let me know how to find the maximum value of a column in data frame

In the answer by @Dadep the link gives the correct answer


回答1:


if you are using pandas .max() will work :

>>> df2=pd.DataFrame({'A':[1,5,0], 'B':[3, 5, 6]})
>>> df2['A'].max()
5

Else if it's a spark dataframe:

Best way to get the max value in a Spark dataframe column




回答2:


I'm coming from scala, but I do believe that this is also applicable on python.

val max = df.select(max("id")).first()

but you have first import the following :

from pyspark.sql.functions import max



回答3:


The following can be used in pyspark:

df.select(max("id")).show()



回答4:


You can use the aggregate max as also mentioned in the pyspark documentation link below:

Link : https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=agg

Code:

row1 = df1.agg({"id": "max"}).collect()[0]


来源:https://stackoverflow.com/questions/43924686/how-to-find-maximum-value-of-a-column-in-python-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!