Read range of files in pySpark

前端 未结 2 423
刺人心
刺人心 2021-01-22 20:24

I need to read contiguous files in pySpark. The following works for me.

from pyspark.sql import SQLContext    
file = \"events.parquet/exportDay=2015090[1-7]\"
         


        
2条回答
  •  忘掉有多难
    2021-01-22 21:01

    It uses shell globbing, I believe.

    The post: How to read multiple text files into a single RDD?

    Seems to suggest the below should work.

    "events.parquet/exportDay=2015090[89],events.parquet/exportDay=2015091[0-4]"

提交回复
热议问题