How to combine small parquet files to one large parquet file? [duplicate]
问题 This question already has answers here : Spark dataframe write method writing many small files (6 answers) Closed last year . I have some partitioned hive tables which point to parquet files. Now I have lot of small parquet files for each partition, each of size around 5kb and I want to merge those small files into one large file per partition. How can I achieve this to increase my hive performance? I have tried reading all the parquet files in the partition to a pyspark dataframe and