I would like to capture the result of show in pyspark, similar to here and here. I was not able to find a solution with pyspark, only scala.
You can build a helper function using the same approach as shown in post you linked Capturing the result of explain() in pyspark. Just examine the source code for show() and observe that it is calling self._jdf.showString()
.
The answer depends on which version of spark you are using, as the number of arguments to show()
has changed over time.
In version 2.3, the vertical
argument was added.
def getShowString(df, n=20, truncate=True, vertical=False):
if isinstance(truncate, bool) and truncate:
return(df._jdf.showString(n, 20, vertical))
else:
return(df._jdf.showString(n, int(truncate), vertical))
As of version 1.5, the truncate
argument was added.
def getShowString(df, n=20, truncate=True):
if isinstance(truncate, bool) and truncate:
return(df._jdf.showString(n, 20))
else:
return(df._jdf.showString(n, int(truncate)))
The show
function was first introduced in version 1.3.
def getShowString(df, n=20):
return(df._jdf.showString(n))
Now use the helper function as follows:
x = getShowString(df) # default arguments
print(x)
#+----+-------+
#| age| name|
#+----+-------+
#|null|Michael|
#| 30| Andy|
#| 19| Justin|
#+----+-------+
Or in your case:
logger.info(getShowString(df))