Spark: Parquet DataFrame operations fail when forcing schema on read
问题 (Spark 2.0.2) The problem here rises when you have parquet files with different schema and force the schema during read. Even though you can print the schema and run show() ok, you cannot apply any filtering logic on the missing columns. Here are the two example schemata: // assuming you are running this code in a spark REPL import spark.implicits._ case class Foo(i: Int) case class Bar(i: Int, j: Int) So Bar includes all the fields of Foo and adds one more ( j ). In real-life this arises