问题
How to get list of file from Azure blob storage in Spark and Scala.
I am not getting any idea to approach this.
回答1:
I don't know the Spark you used is either on Azure or on local. So they are two cases, but similar.
For Spark running on local, there is an offical blog which introduces how to access Azure Blob Storage from Spark. The key is that you need to configure Azure Storage account as HDFS-compatible storage in
core-site.xml
file and add two jarshadoop-azure
&azure-storage
to your classpath for accessing HDFS via the protocolwasb[s]
. You can refer to the offical tutorial to know HDFS-compatible storage withwasb
, and the blog about configuration for HDInsight more details.For Spark running on Azure, the difference is just only access HDFS with
wasb
, the other preparations has been done by Azure when creating HDInsight cluster with Spark.
The method for listing files is listFiles or wholeTextFiles of SparkContext
.
Hope it helps.
回答2:
If you are using databricks, try the below
dbutils.fs.ls(“blob_storage_location”)
来源:https://stackoverflow.com/questions/43474304/how-to-get-list-of-file-from-azure-blob-using-spark-scala