问题
I need to save the output of df.show() as a string so that i can email it directly.
For ex., the below example taken from official spark docs,:
val df = spark.read.json("examples/src/main/resources/people.json")
// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age| name|
// +----+-------+
// |null|Michael|
// | 30| Andy|
// | 19| Justin|
// +----+-------+
I need to save the above table as a string which is printed in the console. I did look at log4j to print the log, but couldnt come across any info on logging only the output.
Can someone help me with it?
回答1:
scala.Console has a withOut method for this kind of thing:
val outCapture = new ByteArrayOutputStream
Console.withOut(outCapture) {
df.show()
}
val result = new String(outCapture.toByteArray)
回答2:
Workaround is to redirect standard output to variable:
val baos = new java.io.ByteArrayOutputStream();
val ps = new java.io.PrintStream(baos);
val oldPs = Console.out
Console.setOut(ps)
df.show()
val content = baos.toString()
Console.setOut(oldPs)
Note that I have one deprecation warning here.
You can also re-implement method Dataset.showString, which generated data. It uses take in background. Maybe it's also a good moment to create PR to make showString public? :)
来源:https://stackoverflow.com/questions/48546963/saving-contents-of-df-show-as-a-string-in-spark-scala-app