I have a job which loads a DataFrame object and then saves the data to parquet format using the DataFrame partitionBy method. Then I publish the paths created s
If the other application is loading specific partition, which it looks like from load("hdfs://localhost:9000/ptest/id=0/") path, that application can tweak code to replace null with partition column value
part = 0 # partition to load
df2 =spark.read.format("parquet")\
.schema(df.schema)\
.load("ptest/id="+str(part)).fillna(part,["id"])
That way, the output will be -
+---+-----+------+
| id|score|letter|
+---+-----+------+
| 0| 1| A|
| 0| 1| B|
| 0| 2| C|
+---+-----+------+