Exporting nested fields with invalid characters from Spark 2 to Parquet [duplicate]

后端未结

关注

 3  2120

再見小時候 2021-01-15 04:55

3条回答

刺人心 (楼主)

2021-01-15 05:37

I have the same problem with @:.

In our case, we solved flattering the DataFrame.

  val ALIAS_RE: Regex = "[_.:@]+".r
  val FIRST_AT_RE: Regex = "^_".r

  def getFieldAlias(field_name: String): String = {
    FIRST_AT_RE.replaceAllIn(ALIAS_RE.replaceAllIn(field_name, "_"), "")
  }

  def selectFields(df: DataFrame, fields: List[String]): DataFrame = {
    var fields_to_select = List[Column]()
    for (field <- fields) {
      val alias = getFieldAlias(field)
      fields_to_select +:= col(field).alias(alias)
    }

    df.select(fields_to_select: _*)
  }

So the following json:

{ 
  object: 'blabla',
  schema: {
    @type: 'blabla',
    name@id: 'blabla'
  }
}

That will be transformed [object, schema.@type, schema.name@id]. @ and dots (in your case =) will create problems for SparkSQL.

So after our SelectFields you can end with [object, schema_type, schema_name_id]. Flattered DataFrame.

0 讨论(0)

查看其它3个回答

热议问题