I have many gzipped files stored on S3 which are organized by project and hour per day, the pattern of the paths of the files is as:
s3:///proj
Note: Under Spark 1.2, the proper format would be as follows:
val rdd = sc.textFile("s3n:////bar.*.gz")
That's s3n://, not s3://
s3n://
s3://
You'll also want to put your credentials in conf/spark-env.sh as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
conf/spark-env.sh
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY