Spark SubQuery scan whole partition

后端 未结 2 957
小蘑菇
小蘑菇 2021-01-13 11:08

I have a hive table which is partitioned by \'date\' field i want to write a query to get the data from latest(max) partition.

spark.sql("select field fr         


        
2条回答
  •  情深已故
    2021-01-13 11:31

    Building on Ram's answer, there is a much simpler way to accomplish this that eliminates a lot of overhead by querying the Hive metastore directly, rather than executing a Spark-SQL query. No need to reinvent the wheel:

    import org.apache.hadoop.hive.conf.HiveConf
    import scala.collection.JavaConverters._
    import org.apache.hadoop.hive.metastore.HiveMetaStoreClient
    
    val hiveConf = new HiveConf(spark.sparkContext.hadoopConfiguration, classOf[HiveConf])
    val cli = new HiveMetaStoreClient(hiveConf)
    val maxPart = cli.listPartitions("", "", Short.MaxValue).asScala.map(_.getValues.asScala.mkString(",")).max
    

提交回复
热议问题