So I have just 1 parquet file I\'m reading with Spark (using the SQL stuff) and I\'d like it to be processed with 100 partitions. I\'ve tried setting spark.default.pa
spark.default.pa
Maybe your parquet file only takes one HDFS block. Create a big parquet file that has many HDFS blocks and load it
val k = sc.parquetFile("the-big-table.parquet") k.partitions.length
You'll see same number of partitions as HDFS blocks. This worked fine for me (spark-1.1.0)