In pandas, this can be done by column.name.
But how to do the same when its column of spark dataframe?
e.g. The calling program has a spark dataframe: spark_df>
If you want the column names of your dataframe, you can use the pyspark.sql
class. I'm not sure if the SDK supports explicitly indexing a DF by column name. I received this traceback:
>>> df.columns['High']
Traceback (most recent call last):
File "
However, calling the columns method on your dataframe, which you have done, will return a list of column names:
df.columns
will return ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
If you want the column datatypes, you can call the dtypes
method:
df.dtypes
will return [('Date', 'timestamp'), ('Open', 'double'), ('High', 'double'), ('Low', 'double'), ('Close', 'double'), ('Volume', 'int'), ('Adj Close', 'double')]
If you want a particular column, you'll need to access it by index:
df.columns[2]
will return 'High'