I have multiple jobs that I want to execute in parallel that append daily data into the same path using partitioning.
e.g.
dataFrame.write().
Instead of using partitionBy
dataFrame.write().
partitionBy("eventDate", "category")
.mode(Append)
.parquet("s3://bucket/save/path");
Alternatively you can write the files as
In job-1 specify the parquet file path as :
dataFrame.write().mode(Append)
.parquet("s3://bucket/save/path/eventDate=20160101/channel=billing_events")
& in job-2 specify the parquet file path as :
dataFrame.write().mode(Append)
.parquet("s3://bucket/save/path/eventDate=20160101/channel=click_events")