Multiple spark jobs appending parquet data to same base path with partitioning
I have multiple jobs that I want to execute in parallel that append daily data into the same path using partitioning. e.g. dataFrame.write(). partitionBy("eventDate", "category") .mode(Append) .parquet("s3://bucket/save/path"); Job 1 - category = "billing_events" Job 2 - category = "click_events" Both of these jobs will truncate any existing partitions that exist in the s3 bucket prior to execution and then save the resulting parquet files to their respective partitions. i.e. job 1 - > s3://bucket/save/path/eventDate=20160101/channel=billing_events job 2 - > s3://bucket/save/path/eventDate