Is there a way to prevent PySpark from creating several small files when writing a DataFrame to JSON file?
If I run:
df.write.format(\'json\').save(
This was a better solution for me.
rdd.map(json.dumps) .saveAsTextFile(json_lines_file_name)