Saving result of DataFrame show() to string in pyspark

本秂侑毒 提交于 2020-01-21 11:51:35

问题


I would like to capture the result of show in pyspark, similar to here and here. I was not able to find a solution with pyspark, only scala.

df.show()
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

The ultimate purpose is to capture this as string inside my logger.info I tried logger.info(df.show()) which will only display on console.


回答1:


You can build a helper function using the same approach as shown in post you linked Capturing the result of explain() in pyspark. Just examine the source code for show() and observe that it is calling self._jdf.showString().

The answer depends on which version of spark you are using, as the number of arguments to show() has changed over time.

Spark Version 2.3 and above

In version 2.3, the vertical argument was added.

def getShowString(df, n=20, truncate=True, vertical=False):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20, vertical))
    else:
        return(df._jdf.showString(n, int(truncate), vertical))

Spark Versions 1.5 through 2.2

As of version 1.5, the truncate argument was added.

def getShowString(df, n=20, truncate=True):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20))
    else:
        return(df._jdf.showString(n, int(truncate)))

Spark Versions 1.3 through 1.4

The show function was first introduced in version 1.3.

def getShowString(df, n=20):
    return(df._jdf.showString(n))

Now use the helper function as follows:

x = getShowString(df)  # default arguments
print(x)
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

Or in your case:

logger.info(getShowString(df))


来源:https://stackoverflow.com/questions/55653609/saving-result-of-dataframe-show-to-string-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!