问题
In pandas, this can be done by column.name.
But how to do the same when its column of spark dataframe?
e.g. The calling program has a spark dataframe: spark_df
>>> spark_df.columns
['admit', 'gre', 'gpa', 'rank']
This program calls my function: my_function(spark_df['rank']) In my_function, I need the name of the column i.e. 'rank'
If it was pandas dataframe, we can use inside my_function
>>> pandas_df['rank'].name
'rank'
回答1:
You can get the names from the schema by doing
spark_df.schema.names
Printing the schema can be useful to visualize it as well
spark_df.printSchema()
回答2:
The only way is to go an underlying level to the JVM.
df.col._jc.toString().encode('utf8')
This is also how it is converted to a str
in the pyspark code itself.
From pyspark/sql/column.py:
def __repr__(self):
return 'Column<%s>' % self._jc.toString().encode('utf8')
回答3:
If you want the column names of your dataframe, you can use the pyspark.sql
class. I'm not sure if the SDK supports explicitly indexing a DF by column name. I received this traceback:
>>> df.columns['High']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
However, calling the columns method on your dataframe, which you have done, will return a list of column names:
df.columns
will return ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
If you want the column datatypes, you can call the dtypes
method:
df.dtypes
will return [('Date', 'timestamp'), ('Open', 'double'), ('High', 'double'), ('Low', 'double'), ('Close', 'double'), ('Volume', 'int'), ('Adj Close', 'double')]
If you want a particular column, you'll need to access it by index:
df.columns[2]
will return 'High'
回答4:
I found the answer is very very simple...
// It is in java, but it should be same in pyspark
Column col = ds.col("colName"); //the column object
String theNameOftheCol = col.toString();
The variable "theNameOftheCol" is "colName"
来源:https://stackoverflow.com/questions/39746752/how-to-get-name-of-dataframe-column-in-pyspark