amazon-s3

aws access s3 from spark using IAM role

我是研究僧i 提交于 2020-06-11 11:43:20
问题 I want to access s3 from spark, I don't want to configure any secret and access keys, I want to access with configuring the IAM role, so I followed the steps given in s3-spark But still it is not working from my EC2 instance (which is running standalone spark) it works when I tested [ec2-user@ip-172-31-17-146 bin]$ aws s3 ls s3://testmys3/ 2019-01-16 17:32:38 130 e.json but it did not work when I tried like below scala> val df = spark.read.json("s3a://testmys3/*") I am getting the below error

Read files from S3 - Pyspark [duplicate]

别来无恙 提交于 2020-06-11 03:15:18
问题 This question already has answers here : Spark Scala read csv file using s3a (1 answer) How to access s3a:// files from Apache Spark? (10 answers) S3A: fails while S3: works in Spark EMR (2 answers) Closed last year . I have been looking for a clear answer to this question all morning but couldn't find anything understandable. I just started to use pyspark (installed with pip) a bit ago and have a simple .py file reading data from local storage, doing some processing and writing results

Read files from S3 - Pyspark [duplicate]

♀尐吖头ヾ 提交于 2020-06-11 03:14:52
问题 This question already has answers here : Spark Scala read csv file using s3a (1 answer) How to access s3a:// files from Apache Spark? (10 answers) S3A: fails while S3: works in Spark EMR (2 answers) Closed last year . I have been looking for a clear answer to this question all morning but couldn't find anything understandable. I just started to use pyspark (installed with pip) a bit ago and have a simple .py file reading data from local storage, doing some processing and writing results

Read files from S3 - Pyspark [duplicate]

蓝咒 提交于 2020-06-11 03:14:03
问题 This question already has answers here : Spark Scala read csv file using s3a (1 answer) How to access s3a:// files from Apache Spark? (10 answers) S3A: fails while S3: works in Spark EMR (2 answers) Closed last year . I have been looking for a clear answer to this question all morning but couldn't find anything understandable. I just started to use pyspark (installed with pip) a bit ago and have a simple .py file reading data from local storage, doing some processing and writing results

Amazon S3 console: download multiple files at once

久未见 提交于 2020-06-09 08:34:06
问题 When I log to my S3 console I am unable to download multiple selected files (the WebUI allows downloads only when one file is selected): https://console.aws.amazon.com/s3 Is this something that can be changed in the user policy or is it a limitation of Amazon? 回答1: It is not possible through the AWS Console web user interface. But it's a very simple task if you install AWS CLI. You can check the installation and configuration steps on Installing in the AWS Command Line Interface After that

How to create AWS Athena table via Glue crawler when the s3 data store has both json and .gz compressed files?

自作多情 提交于 2020-06-09 03:59:25
问题 I have two problems in my intended solution: 1. My S3 store structure is as following: mainfolder/date=2019-01-01/hour=14/abcd.json mainfolder/date=2019-01-01/hour=13/abcd2.json.gz ... mainfolder/date=2019-01-15/hour=13/abcd74.json.gz All json files have the same schema and I want to make a crawler pointing to mainfolder/ which can then create a table in Athena for querying. I have already tried with just one file format, e.g. if the files are just json or just gz then the crawler works

How to create AWS Athena table via Glue crawler when the s3 data store has both json and .gz compressed files?

我的未来我决定 提交于 2020-06-09 03:57:59
问题 I have two problems in my intended solution: 1. My S3 store structure is as following: mainfolder/date=2019-01-01/hour=14/abcd.json mainfolder/date=2019-01-01/hour=13/abcd2.json.gz ... mainfolder/date=2019-01-15/hour=13/abcd74.json.gz All json files have the same schema and I want to make a crawler pointing to mainfolder/ which can then create a table in Athena for querying. I have already tried with just one file format, e.g. if the files are just json or just gz then the crawler works

terraform backend s3 bucket creation returns 403 w/ Terraform 0.11.1

China☆狼群 提交于 2020-06-09 03:28:38
问题 How do I create an S3 bucket that has access to put a terraform.tfstate file? How do I get the tfstate into the bucket? What is the proper way to do this? To preface, I have spent over 6 hours trying to figure this out. I saw the similar post with a problem caused by MFA. That's not my issue. I'm using the same code to create EC2 instances, VPC and other resources just fine. ---[ REQUEST POST-SIGN ]----------------------------- GET /?prefix=env%3A%2F HTTP/1.1 Host: tfstate-neonaluminum.s3.us

terraform backend s3 bucket creation returns 403 w/ Terraform 0.11.1

僤鯓⒐⒋嵵緔 提交于 2020-06-09 03:28:32
问题 How do I create an S3 bucket that has access to put a terraform.tfstate file? How do I get the tfstate into the bucket? What is the proper way to do this? To preface, I have spent over 6 hours trying to figure this out. I saw the similar post with a problem caused by MFA. That's not my issue. I'm using the same code to create EC2 instances, VPC and other resources just fine. ---[ REQUEST POST-SIGN ]----------------------------- GET /?prefix=env%3A%2F HTTP/1.1 Host: tfstate-neonaluminum.s3.us

terraform backend s3 bucket creation returns 403 w/ Terraform 0.11.1

三世轮回 提交于 2020-06-09 03:27:30
问题 How do I create an S3 bucket that has access to put a terraform.tfstate file? How do I get the tfstate into the bucket? What is the proper way to do this? To preface, I have spent over 6 hours trying to figure this out. I saw the similar post with a problem caused by MFA. That's not my issue. I'm using the same code to create EC2 instances, VPC and other resources just fine. ---[ REQUEST POST-SIGN ]----------------------------- GET /?prefix=env%3A%2F HTTP/1.1 Host: tfstate-neonaluminum.s3.us