I have trouble to find in the Spark documentation operations that causes a shuffle and operation that does not. In this list, which ones does cause a shuffle and which ones
This might be helpful: https://spark.apache.org/docs/latest/programming-guide.html#shuffle-operations
or this: http://www.slideshare.net/SparkSummit/dev-ops-training, starting with slide 208
from slide 209: "Transformations that use 'numPartitions' like distinct will probably shuffle"