AnalysisException is thrown when the DataFrame is empty (No such struct field)

[亡魂溺海] 提交于 2020-01-04 09:48:08

问题


I have a dataframe on which I apply a filter and then a series of transformations. At the end, I select several columns.

//  Filters the event related to a user_principal.
  var filteredCount = events.filter("Properties.EventTypeName == 'user_principal_created' or Properties.EventTypeName == 'user_principal_updated'");
                            // Selects the columns based on the event type.
                            .withColumn("Username", when(col("Properties.EventTypeName") === lit("user_principal_created"), col("Body.Username"))
                            .otherwise(col("Body.NewValue.Username")))
                            .withColumn("FirstName", when(col("Properties.EventTypeName") === lit("user_principal_created"), col("Body.FirstName"))
                            .otherwise(col("Body.NewValue.FirstName")))
                            .withColumn("LastName", when(col("Properties.EventTypeName") === lit("user_principal_created"), col("Body.LastName"))
                            .otherwise(col("Body.NewValue.LastName")))
                            .withColumn("PrincipalId", when(col("Properties.EventTypeName") === lit("user_principal_created"), col("Body.PrincipalId"))
                            .otherwise(col("Body.NewValue.PrincipalId")))
                            .withColumn("TenantId", when(col("Properties.EventTypeName") === lit("user_principal_created"), col("Body.TenantId"))
                            .otherwise(col("Body.NewValue.TenantId")))
                            .withColumnRenamed("Timestamp", "LastChangeTimestamp")
                            // Create the custom primary key.
                            .withColumn("PrincipalUserId", substring(concat(col("TenantId"), lit("-"), col("PrincipalId")), 0, 128))                           
                            // Select the rows.
                            .select("PrincipalUserId", "TenantId", "PrincipalId", "FirstName", "LastName", "Username", "LastChangeTimestamp")

It works only if there are rows in events that match the filter. If no row matches the filter, then I do get the following exception:

org.apache.spark.sql.AnalysisException: No such struct field Username in...

Question

What can I do to handle such scenario and prevent the withColumn from failing?

Update

Here is the logical plan when it works:

== Analyzed Logical Plan == Body: struct,CitationNumber:string,Color:string,CommitReference:string,ContactAddress:struct,ControlId:string,Data:string,Dependencies:array>,Description:string,DeviceId:string,Error:bigint,ErrorDetails:string,Exemption:struct,ExternalId:string,FeatureId:string,Features:array,FirstName:string,GroupPrincipals:array,GroupType:bigint,Id:bigint,IsAuthorized:boolean,IsDedicatedStorage:boolean,IsEnabled:boolean,IsInitialCreation:boolean,... 33 more fields>, Id: string, Properties: struct, Timestamp: string Relation[Body#248,Id#249,Properties#250,Timestamp#251] json

And when the exception is thrown:

== Analyzed Logical Plan == Body: struct,Id:bigint,IsAuthorized:boolean,Latitude:double,Longitude:double,Name:string,NewValue:struct,OldValue:struct,Ordinal:bigint,ParentZoneId:string,PrincipalId:bigint,PrincipalName:string,Requirements:array,FeatureId:string,RequirementId:string,ServiceId:string>>,FeatureId:string,RequirementId:string,ServiceId:string>>,RestrictedZoneId:bigint,StreetName:string,TenantId:string,Timestamp:string,... 2 more fields>, Id: string, Properties: struct, Timestamp: string Relation[Body#44,Id#45,Properties#46,Timestamp#47] json

来源:https://stackoverflow.com/questions/55417344/analysisexception-is-thrown-when-the-dataframe-is-empty-no-such-struct-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!