Structured streaming won't write DF to file sink citing /_spark_metadata/9.compact doesn't exist

后端未结

关注

 2  1285

I\'m building a Kafka ingest module in EMR 5.11.1, Spark 2.2.1. My intention is to use Structured Streaming to consume from a Kafka topic, do some processing, and store to E

相关标签:

2条回答

野的像风

2020-12-15 10:05

It turns out that S3 does not support the read-after-write semantics needed by Spark checkpointing.

This article suggests using AWS EFS for checkpointing.

S3 remains a good place to ingest data from, or egest data to.

0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-12-15 10:06
I solved this question by clearing my checkpoint path:
1. remove your checkpoint path:
  
  sudo -u hdfs hdfs dfs -rmr ${your_checkpoint_path}
2. resubmit your spark job.
0 讨论(0)
发布评论:

提交评论
- 加载中...