I\'ve downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:
> E
You should add these jars in you code:
Thank you so much. That worked great, but I had to add the spark jars to the classpath as well: ;c:\spark\lib* Also, the last line of the cmd file is missing the word "echo"; so it should say: echo %SPARK_CMD%
The "without Hadoop" in the Spark's build name is misleading: it means the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)
One clean way to fix this issue is to:
spark-env.cmd
file looking like this (modify Hadoop path to match your installation):
@echo off
set HADOOP_HOME=D:\Utils\hadoop-2.7.1
set PATH=%HADOOP_HOME%\bin;%PATH%
set SPARK_DIST_CLASSPATH=<paste here the output of %HADOOP_HOME%\bin\hadoop classpath>
spark-env.cmd
either in a conf
folder located at the same level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR
environment variable.I had the same problem, in fact it's mentioned on the Getting started page of Spark how to handle it:
### in conf/spark-env.sh ###
# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
If you want to use your own hadoop follow one of the 3 options, copy and paste it into spark-env.sh
file :
1- if you have the hadoop on your PATH
2- you want to show hadoop binary explicitly
3- you can also show hadoop configuration folder
http://spark.apache.org/docs/latest/hadoop-provided.html
for my case
running spark job locally differs from running it on cluster. on cluster you might have a different dependency/context to follow. so essentially in your pom.xml you might have dependencies declared as provided
.
when running locally, you don't need these provided
dependencies. just uncomment them and rebuild again.
I encountered the same error. I wanted to install spark on my windows PC and therefore downloaded the without hadoop version of spark, but turns out you need the hadoop libraries! so download any hadoop spark version and set the environment variables.