PySpark Sql with column name containing dash/hyphen in it

时间秒杀一切 提交于 2021-02-11 17:37:33

问题


I've PySpark dataframe df

data = {'Passenger-Id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},'Age': {0: 22, 1: 38, 2: 26, 3: 35, 4: 35}}
df_pd = pd.DataFrame(data, columns=data.keys())
df = spark.createDataFrame(df_pd)
+------------+---+
|Passenger-Id|Age|
+------------+---+
|           1| 22|
|           2| 38|
|           3| 26|
|           4| 35|
|           5| 35|
+------------+---+

This works

   df.filter(df.Age == 22).show()

But below doesn't work, due to - in the column name

    df.filter(df.Passenger-Id == 2).show()

AttributeError: 'DataFrame' object has no attribute 'Passenger'

I'm facing same issue in spark sql too,

        spark.sql("SELECT  Passenger-Id FROM AutoMobile").show()

        spark.sql("SELECT  automobile.Passenger-Id FROM AutoMobile").show()

Getting below error

AnalysisException: cannot resolve 'Passenger' given input columns: [automobile.Age, automobile.Passenger-Id]

Tried giving the column name with in single quote, as advised in some sources, now it just prints column mentioned in query

  spark.sql("SELECT  'Passenger-Id' FROM AutoMobile").show()
+------------+
|Passenger-Id|
+------------+
|Passenger-Id|
|Passenger-Id|
|Passenger-Id|
|Passenger-Id|
|Passenger-Id|
+------------+

回答1:


Since you have hiphen in column name, I suggest you to use col() function from sql.functions

import pyspark.sql.functions as F
df.filter(F.col('Passenger-Id')== 2).show()

Here is the result

+------------+---+
|Passenger-Id|Age|
+------------+---+
|           2| 38|
+------------+---+

Noe for sql syntax, you need to use special character " ` " not single quote, like below

df.createOrReplaceTempView("AutoMobile")
spark.sql("SELECT  * FROM AutoMobile where `Passenger-Id`=2").show()


来源:https://stackoverflow.com/questions/63899261/pyspark-sql-with-column-name-containing-dash-hyphen-in-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!