问题
I am creating a dataframe in spark by loading tab separated files from s3. I need to get the input file name information of each record in the dataframe for further processing. I tried
dataframe.select(inputFileName())
But I am getting null value for input_file_name. somebody please help me to solve this issue.
回答1:
You can create a new column on the data frame using withColumn
and input_file_name()
:
dataframe.withColumn("input_file", input_file_name())
来源:https://stackoverflow.com/questions/39970738/how-to-get-input-file-name-of-a-record-in-spark-dataframe