get value out of dataframe

人走茶凉 提交于 2019-12-28 04:19:07

问题


In Scala I can do get(#) or getAs[Type](#) to get values out of a dataframe. How should I do it in pyspark?

I have a two columns DataFrame: item(string) and salesNum(integers). I do a groupby and mean to get a mean of those numbers like this:

saleDF.groupBy("salesNum").mean()).collect()

and it works. Now I have the mean in a dataframe with one value.

How can I get that value out of the dataframe to get the mean as a float number?


回答1:


collect() returns your results as a python list. To get the value out of the list you just need to take the first element like this:

saleDF.groupBy("salesNum").mean()).collect()[0] 



回答2:


To be precise, collect returns a list whose elements are of type class 'pyspark.sql.types.Row'.

In your case to extract the real value you should do:

saleDF.groupBy("salesNum").mean()).collect()[0]["avg(yourColumnName)"]

where yourColumnName is the name of the column you are taking the mean of (pyspark, when applying mean, renames the resulting column in this way by default).

As an example, I ran the following code. Look at the types and outputs of each step.

>>> columns = ['id', 'dogs', 'cats', 'nation']
>>> vals = [
...      (2, 0, 1, 'italy'),
...      (1, 2, 0, 'italy'),
...      (3, 4, 0, 'france')
... ]
>>> df = sqlContext.createDataFrame(vals, columns)
>>> df.groupBy("nation").mean("dogs").collect()
[Row(nation=u'france', avg(dogs)=4.0), Row(nation=u'italy', avg(dogs)=1.0)]
>>> df.groupBy("nation").mean("dogs").collect()[0]
Row(nation=u'france', avg(dogs)=4.0))
>>> df.groupBy("nation").mean("dogs").collect()[0]["avg(dogs)"]
4.0
>>> type(df.groupBy("nation").mean("dogs").collect())
<type 'list'>
>>> type(df.groupBy("nation").mean("dogs").collect()[0])
<class 'pyspark.sql.types.Row'>
>>> type(df.groupBy("nation").mean("dogs").collect()[0]["avg(dogs)"])
<type 'float'>
>>> 
>>>     


来源:https://stackoverflow.com/questions/38058950/get-value-out-of-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!