I\'m using pyspark to query from a collection of parquet files stored on hdfs. However, it seems that the query response time is faster the second time it runs. Below are th