How to handle keys in Json with special characters in spark parquet?

问题

I am trying to create data frame from json in parquet format. I am getting following exception,

Exception in thread "main" org.apache.spark.sql.AnalysisException: Attribute name "d?G?@4???[[l?~?N!^w1 ?X!8??ingSuccessful" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;

I know that some json key having special characters is a reason for above exception. However, I do not know how many keys have special characters.

Also, one possible solution is to replace special characters in keys with underscore or blank while creating RDD and reading line by line.

I am creating parquet file using following code,

  dataDf.coalesce(1)
  .write
  .partitionBy("year", "month", "day", "hour")
  .option("header", "true")
  .option("delimiter", "\t)
  .format("parquet")
  .save("events")

回答1:

I know that some json key having special characters is a reason for above exception. However, I do not know how many keys have special characters.

If you don't know how many column names have special characters then you use df.columns to get the column names and replace special characters in all of them.

And finally rename the columns before you write them to parquet files should solve the issue you are having.

I hope the answer is helpful.

来源：https://stackoverflow.com/questions/48971566/how-to-handle-keys-in-json-with-special-characters-in-spark-parquet

标签

json

apache-spark

apache-spark-sql

parquet

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!