how to get input file name of a record in spark dataframe?

三世轮回 提交于 2019-12-11 05:47:52

问题


I am creating a dataframe in spark by loading tab separated files from s3. I need to get the input file name information of each record in the dataframe for further processing. I tried

dataframe.select(inputFileName())

But I am getting null value for input_file_name. somebody please help me to solve this issue.


回答1:


You can create a new column on the data frame using withColumn and input_file_name():

dataframe.withColumn("input_file", input_file_name())


来源:https://stackoverflow.com/questions/39970738/how-to-get-input-file-name-of-a-record-in-spark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!