DataFrame Object is not showing any data

坚强是说给别人听的谎言 提交于 2019-12-25 09:17:05

问题


I was trying to create a dataframe object on a hdfs file using spark csv lib as shown in this tutorial.

But when i tried to get the count of DataFrame object , it is showing as 0

Here is my file look like,

employee.csv:

empid,empname
1000,Tom
2000,Jerry

I loaded the above file using,

val empDf = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("delimiter",",").load("hdfs:///user/.../employee.csv");

When i queried like, empDf object.printSchema() is giving proper schema with empid,empname as string fields and i could see that delimiter was read properly.

But when i tried to display the dataFrame using, empDf.show giving only column header and no data in it and when i do empDf.count giving 0 records.

Please correct me if i missed something to do which is very much required here.


回答1:


Be sure that the spark-csv version and the Scala version with which your Spark distribution is built are the same.

For example, if your Spark distro is built with Scala 2.10 (the default Scala version for Databricks prebuilt Spark distros), you will need spark-csv_2.10 - version spark-csv_2.11 (shown in the mentioned tutorial) will not work, and will return an empty dataframe with only column names - see my answer to this SO question for a similar case.



来源:https://stackoverflow.com/questions/38846422/dataframe-object-is-not-showing-any-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!