Not able to set number of shuffle partition in pyspark
问题 I know that by default, the number of partition for tasks is set to 200 in spark. I can't seem to change this. I'm running jupyter with spark 1.6. I'm loading a fairly small table with about 37K rows from hive using the following in my notebook from pyspark.sql.functions import * sqlContext.sql("set spark.sql.shuffle.partitions=10") test= sqlContext.table('some_table') print test.rdd.getNumPartitions() print test.count() The output confirms 200 tasks. From the activity log, it's spinning up