How to read multiple partitioned .gzip files into a Spark Dataframe?
问题 I have the following folder of partitioned data- my_folder |--part-0000.gzip |--part-0001.gzip |--part-0002.gzip |--part-0003.gzip I try to read this data into a dataframe using- >>> my_df = spark.read.csv("/path/to/my_folder/*") >>> my_df.show(5) +--------------------+ | _c0| +--------------------+ |��[I���...| |��RUu�[*Ք��g��T...| |�t��� �qd��8~��...| |�(���b4�:������I�...| |���!y�)�PC��ќ\�...| +--------------------+ only showing top 5 rows Also tried using this to check the data- >>> rdd =