display DataFrame when using pyspark aws glue

对着背影说爱祢 提交于 2020-01-16 19:34:09

问题


how can I show the DataFrame with job etl of aws glue?

I tried this code below but doesn't display anything.

df.show()

code

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "flux-test", table_name = "tab1", transformation_ctx = "datasource0")
sourcedf = ApplyMapping.apply(frame = datasource0, mappings = [("id", "long", "id", "long"),("Rd.Id_Releve", "string", "Rd.Id_R", "string")])
 sourcedf = sourcedf.toDF()
 data = []
 schema = StructType(
[
    StructField('PM',
        StructType([
            StructField('Pf', StringType(),True),
            StructField('Rd', StringType(),True)
    ])
    ),
    ])
 cibledf = sqlCtx.createDataFrame(data, schema)
 cibledf = sqlCtx.createDataFrame(sourcedf.rdd.map(lambda x:    Row(PM=Row(Pf=str(x.id_prm), Rd=None ))), schema)
 print(cibledf.show())
 job.commit()

回答1:


In your glue console, after you run your glue job, in job listing there would be a column for Logs / Error logs.

Click on the Logs and this would take you to the cloudwatch logs associated to your job. Browse though for the print statement.

also please check here: Convert dynamic frame to a dataframe and do show()

ADDed working/test code sample

Code sample:

zipcode_dynamicframe = glueContext.create_dynamic_frame.from_catalog(
       database = "customer_db",
       table_name = "zipcode_master")
zipcode_dynamicframe.printSchema()
zipcode_dynamicframe.toDF().show(10)

Screenshot for zipcode_dynamicframe.show() in cloudwatch log:



来源:https://stackoverflow.com/questions/59471577/display-dataframe-when-using-pyspark-aws-glue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!