AWS Glue to Redshift: Is it possible to replace, update or delete data?

后端 未结 6 1921
执念已碎
执念已碎 2020-12-25 12:33

Here are some bullet points in terms of how I have things setup:

  • I have CSV files uploaded to S3 and a Glue crawler setup to create the table and schema.
6条回答
  •  孤独总比滥情好
    2020-12-25 13:18

    Today I have tested and got a workaround to update/delete from the target table using JDBC connection.

    I have used as below

    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    
    import pg8000
    args = getResolvedOptions(sys.argv, [
        'JOB_NAME',
        'PW',
        'HOST',
        'USER',
        'DB'
    ])
    # ...
    # Create Spark & Glue context
    
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)
    
    # ...
    config_port = ****
    conn = pg8000.connect(
        database=args['DB'], 
        user=args['USER'], 
        password=args['PW'],
        host=args['HOST'],
        port=config_port
    )
    query = "UPDATE table .....;"
    
    cur = conn.cursor()
    cur.execute(query)
    conn.commit()
    cur.close()
    
    
    
    query1 = "DELETE  AAA FROM  AAA A, BBB B WHERE  A.id = B.id"
    
    cur1 = conn.cursor()
    cur1.execute(query1)
    conn.commit()
    cur1.close()
    conn.close()
    

提交回复
热议问题