问题
I am trying to create data frame from json in parquet format. I am getting following exception,
Exception in thread "main" org.apache.spark.sql.AnalysisException: Attribute name "d?G?@4???[[l?~?N!^w1 ?X!8??ingSuccessful" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
I know that some json key having special characters is a reason for above exception. However, I do not know how many keys have special characters.
Also, one possible solution is to replace special characters in keys with underscore or blank while creating RDD and reading line by line.
I am creating parquet file using following code,
dataDf.coalesce(1)
.write
.partitionBy("year", "month", "day", "hour")
.option("header", "true")
.option("delimiter", "\t)
.format("parquet")
.save("events")
回答1:
I know that some json key having special characters is a reason for above exception. However, I do not know how many keys have special characters.
If you don't know how many column names have special characters then you use df.columns
to get the column names and replace special characters in all of them.
And finally rename the columns before you write them to parquet files should solve the issue you are having.
I hope the answer is helpful.
来源:https://stackoverflow.com/questions/48971566/how-to-handle-keys-in-json-with-special-characters-in-spark-parquet