I can load multiple files at once by passing multiple paths to the load method, e.g.
spark.read
.format(\"com.databricks.spark.avro\")
.load
Also, you can use the paths option, from Spark code source (ResolvedDataSource.scala):
val paths = {
if (caseInsensitiveOptions.contains("paths") &&
caseInsensitiveOptions.contains("path")) {
throw new AnalysisException(s"Both path and paths options are present.")
}
caseInsensitiveOptions.get("paths")
.map(_.split("(?
val hdfsPath = new Path(pathString)
val fs = hdfsPath.getFileSystem(sqlContext.sparkContext.hadoopConfiguration)
val qualified = hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
SparkHadoopUtil.get.globPathIfNecessary(qualified).map(_.toString)
}
}
So a simple:
sqlContext.read.option("paths", paths.mkString(",")).load()
Will do the trick.