Spark: how to use SparkContext.textFile for local file system

后端未结

关注

 6  1646

I\'m just getting started using Apache Spark (in Scala, but the language is irrelevant). I\'m using standalone mode and I\'ll want to process a text file f

相关标签:

6条回答

轻奢々

2020-12-06 01:43

Proper way of using is with three slashes. Two for syntax (just like http://) and one for mount point of linux file system e.g., sc.textFile(file:///home/worker/data/my_file.txt). If you are using local mode then only file is sufficient. In case of standalone cluster, the file must be copied at each node. Note that the contents of the file must be exactly same, otherwise spark returns funny results.

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2020-12-06 01:47

Each node should contain a whole file. In this case local file system will be logically indistinguishable from the HDFS, in respect to this file.

0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2020-12-06 02:02

Add "file:///" uri in place of "file://". This solved the issue for me.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情话喂你

2020-12-06 02:04

From Spark's FAQ page - If you don't use Hadoop/HDFS, "if you run on a cluster, you will need some form of shared file system (for example, NFS mounted at the same path on each node). If you have this type of filesystem, you can just deploy Spark in standalone mode."

https://spark.apache.org/faq.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-12-06 02:07

prepend file:// to your local file path

0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-06 02:09
Spark-1.6.1

Java-1.7.0_99

Nodes in cluster-3(HDP).

Case 1:
```
Running in local mode local[n]
```
file:///.. and file:/.. reads file from local system

Case 2:
```
`--master yarn-cluster`
```
Input path does not exist: for file:/ and file://

And for file://

java.lang.IllegalArgumentException :Wrong FS: file://.. expected: file:///
0 讨论(0)
发布评论:

提交评论
- 加载中...