pyspark: how to show current directory?

吃可爱长大的小学妹 提交于 2019-12-21 20:26:56

问题


Hi I'm using pyspark interactively. I think I'm failing loading a LOCAL file correctly.

how do I check current directory, so that I can go to browser to take a look at that actual file?

Or is the default directory where pyspark is? Thanks


回答1:


You can't load local file unless you have same file in all workers under same path. For example if you want to read data.csv file in spark, copy this file to all workers under same path(say /tmp/data.csv). Now you can use sc.textFile("file:///tmp/data.csv") to create RDD.

Current working directory is the folder from where you have started pyspark. You can start pyspark using ipython and run pwd command to check working directory. [Set PYSPARK_DRIVER_PYTHON=/path/to/ipython in spark-env.sh to use ipython]



来源:https://stackoverflow.com/questions/36995196/pyspark-how-to-show-current-directory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!