Structured streaming won't write DF to file sink citing /_spark_metadata/9.compact doesn't exist

后端 未结 2 1283
既然无缘
既然无缘 2020-12-15 09:50

I\'m building a Kafka ingest module in EMR 5.11.1, Spark 2.2.1. My intention is to use Structured Streaming to consume from a Kafka topic, do some processing, and store to E

相关标签:
2条回答
  • 2020-12-15 10:05

    It turns out that S3 does not support the read-after-write semantics needed by Spark checkpointing.

    This article suggests using AWS EFS for checkpointing.

    S3 remains a good place to ingest data from, or egest data to.

    0 讨论(0)
  • 2020-12-15 10:06

    I solved this question by clearing my checkpoint path:

    1. remove your checkpoint path:

      sudo -u hdfs hdfs dfs -rmr ${your_checkpoint_path}

    2. resubmit your spark job.

    0 讨论(0)
提交回复
热议问题