Spark file system watcher not working on Windows

你离开我真会死。 提交于 2020-05-17 04:40:27

问题


Two people tested Apache Spark on their computers...

We downloaded the version of Spark prebuild for Hadoop 2.6, went to the folder /spark-1.6.2-bin-hadoop2.6/, created a "tmp" directory, and ran:

$ bin/run-example org.apache.spark.examples.streaming.HdfsWordCount tmp

I added arbitrary files content1 and content2dssdgdg to that "tmp" directory.

-------------------------------------------
Time: 1467921704000 ms
-------------------------------------------
(content1,1)
(content2dssdgdg,1)

-------------------------------------------
Time: 1467921706000 ms

Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.

Does Spark's file system watcher not work on Windows?


回答1:


John, I would suggest to use hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version you need to use spark version that is pre-built for user provided hadoop. Make sure to set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, you need to follow steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount you need to pass hdfs:///tmp as directory path arg. All the best.



来源:https://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!