Spark SQL get max & min dynamically from datasource

前端 未结 1 1285
春和景丽
春和景丽 2020-12-21 11:12

I am using Spark SQL where I want to fetch whole data everyday from a Oracle table(consist of more than 1800k records). The application is hanging up when I read from Oracle

相关标签:
1条回答
  • 2020-12-21 11:59

    Just fetch required values from the database:

    url = ...
    properties = ...
    partition_column = ...
    table = ...
    
    # Push aggregation to the database
    query = "(SELECT min({0}), max({0}) FROM {1}) AS tmp".format(
        partition_column, table
    )
    
    (lower_bound, upper_bound) = (spark.read
        .jdbc(url=url, table=query. properties=properties)
        .first())
    

    and pass to the main query:

    num_partitions = ...
    
    spark.read.jdbc(
        url, table, 
        column=partition_column, 
        # Make upper bound inclusive 
        lowerBound=lower_bound, upperBound=upper_bound + 1, 
        numPartitions=num_partitions, properties=properties
    )
    
    0 讨论(0)
提交回复
热议问题