How to get the output from console streaming sink in Zeppelin?

前端 未结 2 1074
隐瞒了意图╮
隐瞒了意图╮ 2020-12-09 12:22

I\'m struggling to get the console sink working with PySpark Structured Streaming when run from Zeppelin. Basically, I\'m not seeing any results printed to the

2条回答
  •  自闭症患者
    2020-12-09 12:45

    zeppelin-0.7.3-bin-all uses Spark 2.1.0 (so no rate format to test Structured Streaming with unfortunately).


    Make sure that when you start a streaming query with socket source nc -lk 9999 has already been started (as the query simply stops otherwise).

    Also make sure that the query is indeed up and running.

    val lines = spark
      .readStream
      .format("socket")
      .option("host", "localhost")
      .option("port", 9999)
      .load
    val q = lines.writeStream.format("console").start
    

    It's indeed true that you won't be able to see the output in a Zeppelin notebook possibly because:

    1. Streaming queries start on their own threads (that seems to be outside Zeppelin's reach)

    2. console sink writes to standard output (uses Dataset.show operator on that separate thread).

    All this makes "intercepting" the output not available in Zeppelin.

    So we come to answer the real question:

    Where is the standard output written to in Zeppelin?

    Well, with a very limited understanding of the internals of Zeppelin, I thought it could be logs/zeppelin-interpreter-spark-[hostname].log, but unfortunately could not find the output from the console sink. That's where you can find the logs from Spark (and Structured Streaming in particular) that use log4j but console sink does not use.

    It looks as if your only long-term solution were to write your own console-like custom sink and use a log4j logger. Honestly, that is not that hard as it sounds. Follow the sources of console sink.

提交回复
热议问题