So I have just 1 parquet file I\'m reading with Spark (using the SQL stuff) and I\'d like it to be processed with 100 partitions. I\'ve tried setting spark.default.pa
You have mentioned that you want to control distribution during write to parquet. When you create parquet from RDDs parquet preserves partitions of the RDD. So, if you create RDD and specify 100 partitions and from dataframe with parquet format then it will be writing 100 separate parquet files to fs.
For read you could specify spark.sql.shuffle.partitions
parameter.