Error using pyspark with WASB/Connecting Pyspark with Azure Blob

陌路散爱 提交于 2019-12-24 06:35:47

问题


I'm currently working on connecting an Azure blob with Pyspark and am encountering difficulties getting the two connected and running. I have installed both required jar files (hadoop-azure-3.2.0-javadoc.jar and azure-storage-8.3.0-javadoc.jar). I set them to be read in my sparkConf by using SparkConf().setAll() and once the I start the session I use:

spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")

sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")

but it always returns

java.io.IOException: No FileSystem for scheme: wasbs

Any thoughts?

I've followed the following:

https://github.com/Azure/mmlspark/issues/456

PySpark java.io.IOException: No FileSystem for scheme: https

spark-shell error : No FileSystem for scheme: wasb

import findspark

findspark.init('dir/spark/spark-2.4.0-bin-hadoop2.7')

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql import SQLContext

conf = SparkConf().setAll([(u'spark.submit.pyFiles', u'/dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar,/dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,/dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,/dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,/dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,/dir/.ivy2/jars/joda-time_joda-time-2.3.jar,/dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,/dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,/dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.app.id', u'local-1553969107475'), (u'spark.driver.port', u'38809'), (u'spark.executor.id', u'driver'), (u'spark.app.name', u'PySparkShell'), (u'spark.driver.host', u'test-VM'), (u'spark.sql.catalogImplementation', u'hive'), (u'spark.rdd.compress', u'True'),(u'spark.serializer.objectStreamReset', u'100'), (u'spark.master', u'local[*]'), (u'spark.submit.deployMode', u'client'), (u'spark.repl.local.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.files', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar,file:///dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,file:///dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar'), (u'spark.ui.showConsoleProgress', u'true')])

sc = SparkContext(conf=conf)
spark = SparkSession(sc)

spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")

sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")

Returns

java.io.IOException: No FileSystem for scheme: wasbs

来源:https://stackoverflow.com/questions/57082556/error-using-pyspark-with-wasb-connecting-pyspark-with-azure-blob

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!