amazon-s3

Spark: Writing RDD Results to File System is Slow

*爱你&永不变心* 提交于 2020-01-03 05:47:06
问题 I'm developing a Spark application with Scala. My application consists of only one operation that requires shuffling (namely cogroup ). It runs flawlessly and at a reasonable time. The issue I'm facing is when I want to write the results back to the file system; for some reason, it takes longer than running the actual program. At first, I tried writing the results without re-partitioning or coalescing, and I realized that the number of generated files are huge, so I thought that was the issue

Carrierwave processed images not uploading to AWS S3

两盒软妹~` 提交于 2020-01-03 03:43:12
问题 Similar problem to this question but the solution provided did not fix my problem. Carrierwave processed images not uploading to S3 Using Railscast #383 as basis for code: http://railscasts.com/episodes/383-uploading-to-amazon-s3 The image is successfully uploaded to S3 using carrierwave_direct. I want to process the images in the background with sidekiq and then upload to S3. The sidekiq worker completes the image processing without error but the processed images (:thumb and :large) are

SparkR on Rstudio - cannot access s3

天大地大妈咪最大 提交于 2020-01-03 02:23:09
问题 I have installed SparkR with R (and Rstudio) on EC2. I'm trying to read files located on s3: temp <- textFile(sc, "s3://dev.xxxx.com/txttest") and get: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).` I've tried to add my access key + secret like so: temp <- textFile(sc, "s3:{access_key:secret_key}

Can someone explain to me what Amazon Web Services components are used in a normal web service?

試著忘記壹切 提交于 2020-01-03 00:32:12
问题 The web service that I want to run on AWS has to store and retrieve user data, present it to the user via a website, and needs to be able to parse the sitemaps of a few thousand sites every 10 min or so. Which components of AWS, such as S3, EC2, and CloudFront do I need to use. A short synopsis about the purpose of each component would be nice. :) I particularly do not understand the purpose of the Simple Queue Service. 回答1: You might, for example, use EC2 (on-demand, scalable, VPS) to host

Give Unique download link for my users for files hosted on Amazon S3

强颜欢笑 提交于 2020-01-03 00:30:32
问题 I have an Amazon S3 account in which I'm storing MP3 files. I play these files on a music player on my web app. I want the users to be able to download the songs from my site. How can I give them a temporary link to download the file? Do I need to give them the path to the file on AS3? I don't want the link to be shared with other people. How is it possible to do? P.S I'm building the app with PHP and the music player in SoundManager 2. 回答1: You can create urls that expire at a specific time.

External checkpoints to S3 on EMR

放肆的年华 提交于 2020-01-02 22:07:25
问题 I am trying to deploy a production cluster for my Flink program. I am using a standard hadoop-core EMR cluster with Flink 1.3.2 installed, using YARN to run it. I am trying to configure my RocksDB to write my checkpoints to an S3 bucket. I am trying to go through these docs: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#set-s3-filesystem. The problem seems to be getting the dependencies working correctly. I receive this error when trying run the program: java.lang

Using S3File for images in KeystoneJS

谁说胖子不能爱 提交于 2020-01-02 19:29:10
问题 I'd like to know if it's possible, and how much of an effort it would be to use S3File as an image field in KeystoneJS. My testing indicates that while you can indeed upload an image to an S3File, the admin interface treats it as an arbitrary file. The thing I'm missing the most is a preview function like Types.CloudinaryImage provides. Is the autogenerated admin interface easily extensible without it being ugly hacks? Or is it meant to be left untouched because of the simple fact that it

Rails: How to send file from S3 to remote server

女生的网名这么多〃 提交于 2020-01-02 15:38:22
问题 I've been hunting around and can't seem to find a good solution for this. My Rails app stores it's files in Amazon S3. I now need to send them to a remote (3rd party) service. I'm using RestClient to post to the 3rd party server like this: send_file = RestClient::Request.execute( :method => :post, :url => "http://remote-server-url.com", :payload => File.new("some_local_file.avi", 'rb'), :multipart => true, etc.... ) It works for local files, but how can I send a remote file from S3 directly

How to load files in sparksql through remote hive storage ( s3 orc) using spark/scala + code + configuration

混江龙づ霸主 提交于 2020-01-02 11:47:06
问题 intellij(spark)--->Hive(Remote)---storage on S3(orc format) Not able to read remote Hive table through spark/scala. was able to read table schema but not able to read table. Error -Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). import org.apache.spark.SparkConf import org

How to close AWS connection if there is no such key in the query

天涯浪子 提交于 2020-01-02 11:30:23
问题 I'm using AWS java SDK to upload file on AWS Management Console's Bucket. However, if there is no such file online at first time when I try to get access to it, my code will catch the exception (NoSuchKey). Then I want to close the connection. The problem is I don't have any reference to close that connection because of the exception(The original reference will be null). Here is my code: S3Object object = null; GetObjectRequest req = new GetObjectRequest(bucketName, fileName); try{ logconfig(