可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I implemented spark application. I've created spark context:

    private JavaSparkContext createJavaSparkContext() {             SparkConf conf = new SparkConf();             conf.setAppName("test");             if (conf.get("spark.master", null) == null) {                 conf.setMaster("local[4]");             }             conf.set("fs.s3a.awsAccessKeyId", getCredentialConfig().getS3Key());             conf.set("fs.s3a.awsSecretAccessKey", getCredentialConfig().getS3Secret());             conf.set("fs.s3a.endpoint", getCredentialConfig().getS3Endpoint());              return new JavaSparkContext(conf);         }

And I try to get data from s3 via spark dataset API (Spark SQL):

     String s = "s3a://" + getCredentialConfig().getS3Bucket();      Dataset<Row> csv = getSparkSession()                         .read()                         .option("header", "true")                         .csv(s + "/dataset.csv");       System.out.println("Read size :" + csv.count());

There is an error:

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 1A3E8CBD4959289D, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: Q1Fv8sNvcSOWGbhJSu2d3Nfgow00388IpXiiHNKHz8vI/zysC8V8/YyQ1ILVsM2gWQIyTy1miJc=

Hadoop version: 2.7

AWS endpoint: s3.eu-central-1.amazonaws.com

(On hadoop 2.8 - all works fine)

回答1:

The problem is: Frankfurt doesn't support s3n. Need to use s3a. And this region has V4 auth version. http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

EU (Frankfurt) eu-central-1 Version 4 only

It mean's need to enable it on aws client. Need to add system property

com.amazonaws.services.s3.enableV4 -> true

conf.set("com.amazonaws.services.s3.enableV4", "true");//doesn't work for me

On local machine I've used:

System.setProperty("com.amazonaws.services.s3.enableV4", "true");

For running on AWS EMR need to add params to spark-submit:

spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true

Additionally you should add class implementation for file systems:

conf.set("spark.hadoop.fs.s3a.impl", org.apache.hadoop.fs.s3a.S3AFileSystem.class.getName()); conf.set("spark.hadoop.fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set("spark.hadoop.fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());

文章来源: Spark doesn't read/write information from s3 (ResponseCode=400, ResponseMessage=Bad Request)

标签

aws

Hadoop

Spark doesn&#039;t read/write information from s3 (ResponseCode=400, ResponseMessage=Bad Request)

问题:

回答1:

Spark doesn't read/write information from s3 (ResponseCode=400, ResponseMessage=Bad Request)