Saving contents of df.show() as a string in spark-scala app

别来无恙 提交于 2019-12-06 03:06:56

问题


I need to save the output of df.show() as a string so that i can email it directly.

For ex., the below example taken from official spark docs,:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

I need to save the above table as a string which is printed in the console. I did look at log4j to print the log, but couldnt come across any info on logging only the output.

Can someone help me with it?


回答1:


scala.Console has a withOut method for this kind of thing:

val outCapture = new ByteArrayOutputStream
Console.withOut(outCapture) {
  df.show()
}
val result = new String(outCapture.toByteArray)



回答2:


Workaround is to redirect standard output to variable:

val baos = new java.io.ByteArrayOutputStream();
val ps =  new java.io.PrintStream(baos);

val oldPs = Console.out
Console.setOut(ps)
df.show()
val content = baos.toString()
Console.setOut(oldPs)

Note that I have one deprecation warning here.

You can also re-implement method Dataset.showString, which generated data. It uses take in background. Maybe it's also a good moment to create PR to make showString public? :)



来源:https://stackoverflow.com/questions/48546963/saving-contents-of-df-show-as-a-string-in-spark-scala-app

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!