Spark file system watcher not working on Windows

问题

Two people tested Apache Spark on their computers...

We downloaded the version of Spark prebuild for Hadoop 2.6, went to the folder /spark-1.6.2-bin-hadoop2.6/, created a "tmp" directory, and ran:

$ bin/run-example org.apache.spark.examples.streaming.HdfsWordCount tmp

I added arbitrary files content1 and content2dssdgdg to that "tmp" directory.

-------------------------------------------
Time: 1467921704000 ms
-------------------------------------------
(content1,1)
(content2dssdgdg,1)

-------------------------------------------
Time: 1467921706000 ms

Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.

Does Spark's file system watcher not work on Windows?

回答1:

John, I would suggest to use hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version you need to use spark version that is pre-built for user provided hadoop. Make sure to set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, you need to follow steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount you need to pass hdfs:///tmp as directory path arg. All the best.

来源：https://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows

标签

windows

Ubuntu

apache-spark

filesystemwatcher