问题
I have a DynamicFrame
in Glue
and I am using the Relationalize
method which creates me 3 new dynamic frames; root_table
, root_table_1
and root_table_2
.
When I print the Schema of the tables or after I inserted the tables in database I noticed that in the root_table
the id is missing so I cannot make joins between the root_table
and other tables.
I tried all the possible combinations.
Is there something i missing?
datasource1 = Relationalize.apply(frame = renameId, name = "root_ds", transformation_ctx = "datasource1")
print(datasource1.keys())
print(datasource1.values())
for df_name in datasource1.keys():
m_df = datasource1.select(df_name)
print "Writing to Redshift table: ", df_name
m_df.printSchema()
glueContext.write_dynamic_frame.from_jdbc_conf(frame = m_df, catalog_connection = "Redshift", connection_options = {"database" : "redshift", "dbtable" : df_name}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "df_to_db")
回答1:
I used the code below (removing the import bits) on your data and got wrote into S3. I got two files as pasted after the code. I am reading from the glue catalog after running the crawler on your data.
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "sampledb", table_name = "json_aws_glue_relationalize_stackoverflow", transformation_ctx = "datasource0")
dfc = datasource0.relationalize("advertise_root", "s3://aws-glue-temporary-009551040880-ap-southeast-2/")
for df_name in dfc.keys():
m_df = dfc.select(df_name)
print "Writing to S3 file: ", df_name
datasink2 = glueContext.write_dynamic_frame.from_options(frame = m_df, connection_type = "s3", connection_options = {"path": "s3://aws-glue-relationalize-stackoverflow/" + df_name +"/"}, format = "csv", transformation_ctx = "datasink2")
job.commit()
main table advertiserCountry,advertiserId,amendReason,amended,clickDate,clickDevice,clickRefs.clickRef2,clickRefs.clickRef6,commissionAmount.amount,"commissionAmount.currency","commissionSharingPublisherId",commissionStatus,customParameters,customerCountry,declineReason,id,ipHash,lapseTime,oldCommissionAmount,oldSaleAmount,orderRef,originalSaleAmount,paidToPublisher,paymentId,publisherId,publisherUrl,saleAmount.amount,saleAmount.currency,siteName,transactionDate,transactionDevice,transactionParts,transactionQueryId,type,url,validationDate,voucherCode,voucherCodeUsed,partition_0 AT,123456,,false,2018-09-05T16:31:00,iPhone,"asdsdedrfrgthyjukiloujhrdf45654565423212",www.website.at,1.5,EUR,,pending,,AT,,321547896,-27670654789123380,68,,,,,false,0,654987,,1.0,EUR,https://www.site.at,2018-09-05T16:32:00,iPhone,1,0,Lead,https://www.website.at,,,false,advertise
Another table for transaction parts id,index,"transactionParts.val.amount","transactionParts.val.commissionAmount","transactionParts.val.commissionGroupCode","transactionParts.val.commissionGroupId","transactionParts.val.commissionGroupName" 1,0,1.0,1.5,LEAD,654654,Lead
Glue generated primary key column named "transactionParts" in the base table and the id in the transactionparts table is the foreign key to that column. As you can see it preserved, the original id column as it is.
Can you please try the code on your data and see if it works (changing the source table name as per yours)? Try to write to S3 as CSV first to figure out if thats working. Please let me know your findings.
来源:https://stackoverflow.com/questions/52537132/when-using-relationalize-in-glue-there-is-no-id-in-root-table