Spark Streaming checkpoint to amazon s3

后端 未结 2 860
南旧
南旧 2020-12-14 12:38

I am trying to checkpoint the rdd to non-hdfs system. From DSE document it seems that it is not possible to use cassandra file system. So I am planning to use amazon s3 . Bu

2条回答
  •  情书的邮戳
    2020-12-14 12:57

    From the answer in the link

    Solution 1:

    export AWS_ACCESS_KEY_ID=
    export AWS_SECRET_ACCESS_KEY=
    ssc.checkpoint(checkpointDirectory)
    

    Set the checkpoint directory as S3 URL - s3n://spark-streaming/checkpoint

    And then launch your spark application using spark submit. This works in spark 1.4.2

    solution 2:

      val hadoopConf: Configuration = new Configuration()
      hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
      hadoopConf.set("fs.s3n.awsAccessKeyId", "id-1")
      hadoopConf.set("fs.s3n.awsSecretAccessKey", "secret-key")
    
      StreamingContext.getOrCreate(checkPointDir, () => {
            createStreamingContext(checkPointDir, config)
          }, hadoopConf)
    

提交回复
热议问题