问题
Hi I'm using pyspark interactively. I think I'm failing loading a LOCAL file correctly.
how do I check current directory, so that I can go to browser to take a look at that actual file?
Or is the default directory where pyspark is? Thanks
回答1:
You can't load local file unless you have same file in all workers under same path. For example if you want to read data.csv file in spark, copy this file to all workers under same path(say /tmp/data.csv). Now you can use sc.textFile("file:///tmp/data.csv") to create RDD.
Current working directory is the folder from where you have started pyspark. You can start pyspark using ipython and run pwd command to check working directory. [Set PYSPARK_DRIVER_PYTHON=/path/to/ipython in spark-env.sh to use ipython]
来源:https://stackoverflow.com/questions/36995196/pyspark-how-to-show-current-directory