spark-shell error : No FileSystem for scheme: wasb

怎甘沉沦 提交于 2019-11-30 15:57:07

Another way of setting Azure Storage (wasb and wasbs files) in spark-shell is:

  1. Copy azure-storage and hadoop-azure jars in the ./jars directory of spark installation.
  2. Run the spark-shell with the parameters —jars [a comma separated list with routes to those jars] Example:

    $ bin/spark-shell --master "local[*]" --jars jars/hadoop-azure-2.7.0.jar,jars/azure-storage-2.0.0.jar
  3. Add the following lines to the Spark Context:

    sc.hadoopConfiguration.set("", "")
    sc.hadoopConfiguration.set("", "my_key")
  4. Run a simple query:

  5. Enjoy :)

With this settings you could easily could setup a Spark application, passing the parameters to the 'hadoopConfiguration' on the current Spark Context

Hai Ning from Microsoft has written an excellent blog post on to setup wasb on an apache hadoop installation.

Here is the summary:

  1. Add hadoop-azure-*.jar and azure-storage-*.jar to hadoop classpath

    1.1 Find the jars in your local installation. It's at /usr/hdp/current/hadoop-client folder on HDInsight cluster.

    1.2 Update HADOOP_CLASSPATH variable at Use exact jar name as java classpath doesn't support partial wildcard.

  2. Update core-site.xml

    <!-- optionally set the default file system to a container --> 

See exact steps here:
