NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

前端 未结 14 1123
北荒
北荒 2020-11-30 02:07

I\'ve downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:

> E         


        
相关标签:
14条回答
  • 2020-11-30 02:23

    You should add these jars in you code:

    1. common-cli-1.2.jar
    2. hadoop-common-2.7.2.jar
    0 讨论(0)
  • 2020-11-30 02:24

    Thank you so much. That worked great, but I had to add the spark jars to the classpath as well: ;c:\spark\lib* Also, the last line of the cmd file is missing the word "echo"; so it should say: echo %SPARK_CMD%

    0 讨论(0)
  • 2020-11-30 02:25

    The "without Hadoop" in the Spark's build name is misleading: it means the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)

    One clean way to fix this issue is to:

    1. Obtain Hadoop Windows binaries. Ideally build them, but this is painful (for some hints see: Hadoop on Windows Building/ Installation Error). Otherwise Google some up, for instance currently you can download 2.6.0 from here: http://www.barik.net/archive/2015/01/19/172716/
    2. Create a spark-env.cmd file looking like this (modify Hadoop path to match your installation): @echo off set HADOOP_HOME=D:\Utils\hadoop-2.7.1 set PATH=%HADOOP_HOME%\bin;%PATH% set SPARK_DIST_CLASSPATH=<paste here the output of %HADOOP_HOME%\bin\hadoop classpath>
    3. Put this spark-env.cmd either in a conf folder located at the same level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR environment variable.
    0 讨论(0)
  • 2020-11-30 02:25

    I had the same problem, in fact it's mentioned on the Getting started page of Spark how to handle it:

    ### in conf/spark-env.sh ###
    
    # If 'hadoop' binary is on your PATH
    export SPARK_DIST_CLASSPATH=$(hadoop classpath)
    
    # With explicit path to 'hadoop' binary
    export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
    
    # Passing a Hadoop configuration directory
    export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
    

    If you want to use your own hadoop follow one of the 3 options, copy and paste it into spark-env.sh file :

    1- if you have the hadoop on your PATH

    2- you want to show hadoop binary explicitly

    3- you can also show hadoop configuration folder

    http://spark.apache.org/docs/latest/hadoop-provided.html

    0 讨论(0)
  • 2020-11-30 02:32

    for my case

    running spark job locally differs from running it on cluster. on cluster you might have a different dependency/context to follow. so essentially in your pom.xml you might have dependencies declared as provided.

    when running locally, you don't need these provided dependencies. just uncomment them and rebuild again.

    0 讨论(0)
  • 2020-11-30 02:35

    I encountered the same error. I wanted to install spark on my windows PC and therefore downloaded the without hadoop version of spark, but turns out you need the hadoop libraries! so download any hadoop spark version and set the environment variables.

    0 讨论(0)
提交回复
热议问题