Spark Dataframe validating column names for parquet writes (scala)
问题 I'm processing events using Dataframes converted from a stream of JSON events which eventually gets written out as as Parquet format. However, some of the JSON events contains spaces in the keys which I want to log and filter/drop such events from the data frame before converting it to Parquet because ,;{}()\n\t= are considered special characters in Parquet schema (CatalystSchemaConverter) as listed in [1] below and thus should not be allowed in the column names. How can I do such validations