Reading parquet files from multiple directories in Pyspark

后端 未结 5 2110
北恋
北恋 2020-12-03 15:21

I need to read parquet files from multiple paths that are not parent or child directories.

for example,

dir1 ---
       |
       ------- dir1_1
            


        
5条回答
  •  -上瘾入骨i
    2020-12-03 15:36

    Both the parquetFile method of SQLContext and the parquet method of DataFrameReader take multiple paths. So either of these works:

    df = sqlContext.parquetFile('/dir1/dir1_2', '/dir2/dir2_1')
    

    or

    df = sqlContext.read.parquet('/dir1/dir1_2', '/dir2/dir2_1')
    

提交回复
热议问题