Cannot resolve column (numeric column name) in Spark Dataframe

扶醉桌前 提交于 2019-12-04 11:27:53

问题


This is my data:

scala> data.printSchema
root
 |-- 1.0: string (nullable = true)
 |-- 2.0: string (nullable = true)
 |-- 3.0: string (nullable = true)

This doesn't work :(

scala> data.select("2.0").show

Exception:

org.apache.spark.sql.AnalysisException: cannot resolve '`2.0`' given input columns: [1.0, 2.0, 3.0];;
'Project ['2.0]
+- Project [_1#5608 AS 1.0#5615, _2#5609 AS 2.0#5616, _3#5610 AS 3.0#5617]
   +- LocalRelation [_1#5608, _2#5609, _3#5610]
        ...

Try this at home (I'm running on the shell v_2.1.0.5)!

val data = spark.createDataFrame(Seq(
  ("Hello", ", ", "World!")
)).toDF("1.0", "2.0", "3.0")
data.select("2.0").show

回答1:


You can use backticks to escape the dot, which is reserved for accessing columns for struct type:

data.select("`2.0`").show
+---+
|2.0|
+---+
| , |
+---+



回答2:


The problem is you can not add dot character in the column name while selecting from dataframe. You can have a look at this question, kind of similar.

val data = spark.createDataFrame(Seq(
  ("Hello", ", ", "World!")
)).toDF("1.0", "2.0", "3.0")
data.select(sanitize("2.0")).show

def sanitize(input: String): String = s"`$input`"


来源:https://stackoverflow.com/questions/42698322/cannot-resolve-column-numeric-column-name-in-spark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!