How to move data from Glue to Dynamodb

戏子无情 提交于 2019-12-05 00:28:53

问题


We are designing an Big data solution for one of our dashboard applications and seriously considering Glue for our initial ETL. Currently Glue supports JDBC and S3 as the target but our downstream services and components will work better with dynamodb. We are wondering what is the best approach to eventually move the records from Glue to Dynamo.

Should we write to S3 first and then run lambdas to insert the data into Dynamo? Is that the best practice? OR Should we use a third party JDBC wrapper for Dynamodb and use Glue to directly write to Dynamo (Not sure if this is possible, sounds a bit scary) OR Should we do something else?

Any help is greatly appreciated. Thanks!


回答1:


I am able to write using boto3... definitly its not best approach to load but its working one. :)

dynamodb = boto3.resource('dynamodb','us-east-1') table = 
dynamodb.Table('BULK_DELIVERY')

print "Start testing"

for row in df1.rdd.collect():
    var1=row.sourceCid 
    print(var1) table.put_item( Item={'SOURCECID': "{}".format(var1)} )

print "End testing"



回答2:


For your workloads, Amaon actually recommens using data pipelines.

It bypasses glue. So it is mostly used to load S3 files to Dynamo. But it may work.




回答3:


You can add the following lines to your Glue ETL script:

    glueContext.write_dynamic_frame.from_options(frame =DynamicFrame.fromDF(df, glueContext, "final_df"), connection_type = "dynamodb", connection_options = {"tableName": "pceg_ae_test"})

df should be of type DynamicFrame



来源:https://stackoverflow.com/questions/49063554/how-to-move-data-from-glue-to-dynamodb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!