Reading parquet files from multiple directories in Pyspark

后端 未结 5 2120
北恋
北恋 2020-12-03 15:21

I need to read parquet files from multiple paths that are not parent or child directories.

for example,

dir1 ---
       |
       ------- dir1_1
            


        
5条回答
  •  余生分开走
    2020-12-03 15:56

    For ORC

    spark.read.orc("/dir1/*","/dir2/*")
    

    spark goes inside dir1/ and dir2/ folder and load all the ORC files.

    For Parquet,

    spark.read.parquet("/dir1/*","/dir2/*")
    

提交回复
热议问题