I have many gzipped files stored on S3 which are organized by project and hour per day, the pattern of the paths of the files is as:
s3:///proj
Using AWS EMR with Spark 2.0.0 and SparkR in RStudio I've managed to read the gz compressed wikipedia stat files stored in S3 using the below command:
df <- read.text("s3:///pagecounts-20110101-000000.gz")
Similarly, for all files under 'Jan 2011' you can use the above command like below:
df <- read.text("s3:///pagecounts-201101??-*.gz")
See the SparkR API docs for more ways of doing it. https://spark.apache.org/docs/latest/api/R/read.text.html