发表新帖

发表新帖

How to get name of dataframe column in pyspark?

后端未结

关注

 5  1070

深忆病人 2021-02-01 13:44

In pandas, this can be done by column.name.

But how to do the same when its column of spark dataframe?

e.g. The calling program has a spark dataframe: spark_df

5条回答

渐次进展 (楼主)

2021-02-01 14:17

If you want the column names of your dataframe, you can use the pyspark.sql class. I'm not sure if the SDK supports explicitly indexing a DF by column name. I received this traceback:

>>> df.columns['High'] Traceback (most recent call last): File "", line 1, in TypeError: list indices must be integers, not str

However, calling the columns method on your dataframe, which you have done, will return a list of column names:

df.columns will return ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

If you want the column datatypes, you can call the dtypes method:

df.dtypes will return [('Date', 'timestamp'), ('Open', 'double'), ('High', 'double'), ('Low', 'double'), ('Close', 'double'), ('Volume', 'int'), ('Adj Close', 'double')]

If you want a particular column, you'll need to access it by index:

df.columns[2] will return 'High'

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题