I am trying to process JSON events received in a mobile app (like clicks etc.) using spark 1.5.2. There are multiple app versions and the structure of the event
Given two messages M1 and M2 like
case class Ev1(app1: String)
case class M1(ts: String, ev1: Ev1)
case class Ev2(app2: String)
case class M2(ts: String, ev2: Ev2)
and two data frames df1 (which contains M1), and df2 (containing M2), both data frames registered as temp tables, then you can use QL:
val merged = sqlContext.sql(
"""
|select
| df1.ts as ts,
| named_struct('app', df1.ev1.app1) as ev
| from
| df1
|
|union all
|
|select
| df2.ts as ts,
| named_struct('app', df2.ev2.app2) as ev
| from
| df2
""".stripMargin)
as to give the same namesnamed_struct to build compatible nested structs on-the flyunion all to put it all togetherNot shown in the example, but functions like collect_list might be useful as well.