Fast Parquet row count in Spark

后端 未结 2 1083
孤城傲影
孤城傲影 2021-01-04 09:11

The Parquet files contain a per-block row count field. Spark seems to read it at some point (SpecificParquetRecordReaderBase.java#L151).

I tried this in spark-

2条回答
  •  臣服心动
    2021-01-04 09:22

    We can also use

    java.text.NumberFormat.getIntegerInstance.format(sparkdf.count)

提交回复
热议问题