Apache Spark (Structured Streaming) : S3 Checkpoint support

后端 未结 4 2061
温柔的废话
温柔的废话 2020-12-09 00:00

From the spark structured streaming documentation: \"This checkpoint location has to be a path in an HDFS compatible file system, and can be set as an option in the D

4条回答
  •  被撕碎了的回忆
    2020-12-09 00:29

    This problem is fixed in https://issues.apache.org/jira/browse/SPARK-19407.

    However Structured Streaming checkpointing doesn't work well in S3 because of lack of eventual consistency in S3. It's not a good idea to use S3 for checkpointing https://issues.apache.org/jira/browse/SPARK-19013.

    Micheal Armburst has said that this won't be fixed in Spark, and the solution is to wait for S3guard to be implemented. S3Guard is sometime away.

提交回复
热议问题