How to get name of dataframe column in pyspark?

后端 未结 5 1070
深忆病人
深忆病人 2021-02-01 13:44

In pandas, this can be done by column.name.

But how to do the same when its column of spark dataframe?

e.g. The calling program has a spark dataframe: spark_df

5条回答
  •  渐次进展
    2021-02-01 14:17

    If you want the column names of your dataframe, you can use the pyspark.sql class. I'm not sure if the SDK supports explicitly indexing a DF by column name. I received this traceback:

    >>> df.columns['High'] Traceback (most recent call last): File "", line 1, in TypeError: list indices must be integers, not str

    However, calling the columns method on your dataframe, which you have done, will return a list of column names:

    df.columns will return ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

    If you want the column datatypes, you can call the dtypes method:

    df.dtypes will return [('Date', 'timestamp'), ('Open', 'double'), ('High', 'double'), ('Low', 'double'), ('Close', 'double'), ('Volume', 'int'), ('Adj Close', 'double')]

    If you want a particular column, you'll need to access it by index:

    df.columns[2] will return 'High'

提交回复
热议问题