AWS Glue to Redshift: Is it possible to replace, update or delete data?

后端未结

关注

 6  1921

执念已碎 2020-12-25 12:33

Here are some bullet points in terms of how I have things setup:

I have CSV files uploaded to S3 and a Glue crawler setup to create the table and schema.

6条回答

孤独总比滥情好 (楼主)

2020-12-25 13:18

Today I have tested and got a workaround to update/delete from the target table using JDBC connection.

I have used as below

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

import pg8000
args = getResolvedOptions(sys.argv, [
    'JOB_NAME',
    'PW',
    'HOST',
    'USER',
    'DB'
])
# ...
# Create Spark & Glue context

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

# ...
config_port = ****
conn = pg8000.connect(
    database=args['DB'], 
    user=args['USER'], 
    password=args['PW'],
    host=args['HOST'],
    port=config_port
)
query = "UPDATE table .....;"

cur = conn.cursor()
cur.execute(query)
conn.commit()
cur.close()



query1 = "DELETE  AAA FROM  AAA A, BBB B WHERE  A.id = B.id"

cur1 = conn.cursor()
cur1.execute(query1)
conn.commit()
cur1.close()
conn.close()

0 讨论(0)

查看其它6个回答