When using Relationalize in Glue there is no id in root table

谁说胖子不能爱 提交于 2020-01-05 06:08:00

问题


I have a DynamicFrame in Glue and I am using the Relationalize method which creates me 3 new dynamic frames; root_table, root_table_1 and root_table_2.

When I print the Schema of the tables or after I inserted the tables in database I noticed that in the root_table the id is missing so I cannot make joins between the root_table and other tables.

I tried all the possible combinations.

Is there something i missing?

    datasource1 = Relationalize.apply(frame = renameId, name = "root_ds", transformation_ctx = "datasource1")
print(datasource1.keys())
print(datasource1.values())
for df_name in datasource1.keys():
    m_df = datasource1.select(df_name)
    print "Writing to Redshift table: ", df_name
    m_df.printSchema()

    glueContext.write_dynamic_frame.from_jdbc_conf(frame = m_df, catalog_connection = "Redshift", connection_options = {"database" : "redshift", "dbtable" : df_name}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "df_to_db")

回答1:


I used the code below (removing the import bits) on your data and got wrote into S3. I got two files as pasted after the code. I am reading from the glue catalog after running the crawler on your data.

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "sampledb", table_name = "json_aws_glue_relationalize_stackoverflow", transformation_ctx = "datasource0")

dfc = datasource0.relationalize("advertise_root", "s3://aws-glue-temporary-009551040880-ap-southeast-2/")

for df_name in dfc.keys():
    m_df = dfc.select(df_name)
    print "Writing to S3 file: ", df_name
    datasink2 = glueContext.write_dynamic_frame.from_options(frame = m_df, connection_type = "s3", connection_options = {"path": "s3://aws-glue-relationalize-stackoverflow/" + df_name +"/"}, format = "csv", transformation_ctx = "datasink2")

job.commit()

main table advertiserCountry,advertiserId,amendReason,amended,clickDate,clickDevice,clickRefs.clickRef2,clickRefs.clickRef6,commissionAmount.amount,"commissionAmount.currency","commissionSharingPublisherId",commissionStatus,customParameters,customerCountry,declineReason,id,ipHash,lapseTime,oldCommissionAmount,oldSaleAmount,orderRef,originalSaleAmount,paidToPublisher,paymentId,publisherId,publisherUrl,saleAmount.amount,saleAmount.currency,siteName,transactionDate,transactionDevice,transactionParts,transactionQueryId,type,url,validationDate,voucherCode,voucherCodeUsed,partition_0 AT,123456,,false,2018-09-05T16:31:00,iPhone,"asdsdedrfrgthyjukiloujhrdf45654565423212",www.website.at,1.5,EUR,,pending,,AT,,321547896,-27670654789123380,68,,,,,false,0,654987,,1.0,EUR,https://www.site.at,2018-09-05T16:32:00,iPhone,1,0,Lead,https://www.website.at,,,false,advertise

Another table for transaction parts id,index,"transactionParts.val.amount","transactionParts.val.commissionAmount","transactionParts.val.commissionGroupCode","transactionParts.val.commissionGroupId","transactionParts.val.commissionGroupName" 1,0,1.0,1.5,LEAD,654654,Lead

Glue generated primary key column named "transactionParts" in the base table and the id in the transactionparts table is the foreign key to that column. As you can see it preserved, the original id column as it is.

Can you please try the code on your data and see if it works (changing the source table name as per yours)? Try to write to S3 as CSV first to figure out if thats working. Please let me know your findings.



来源:https://stackoverflow.com/questions/52537132/when-using-relationalize-in-glue-there-is-no-id-in-root-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!