How to write PySpark dataframe to DynamoDB table?

后端 未结 1 612
长发绾君心
长发绾君心 2021-01-16 05:03

How to write PySpark dataframe to DynamoDB table? Did not find much info on this. As per my requirement, i have to write PySpark dataframe to Dynamo db table. Overall i need

相关标签:
1条回答
  • 2021-01-16 05:40

    Ram, there's no way to do that directly from pyspark. If you have pipeline software running it can be done in a series of steps. Here is how it can be done:

    1. Create a temporary hive table like

      CREATE TABLE TEMP( column1 type, column2 type...) STORED AS ORC;

    2. Run your pySpark job and write your data to it

      dataframe.createOrReplaceTempView("df") spark.sql("INSERT OVERWRITE TABLE temp SELECT * FROM df")

    3. Create the dynamo connector table

      CREATE TABLE TEMPTODYNAMO( column1 type, column2 type...) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "temp-to-dynamo", "dynamodb.column.mapping" = "column1:column1,column2:column2...";

    4. Overwrite that table with your temp table

      INSERT OVERWRITE TABLE TEMPTODYNAMO SELECT * FROM TEMP;

    More info here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/EMR_Hive_Commands.html

    0 讨论(0)
提交回复
热议问题