Reading data from Azure Blob with Spark

时间秒杀一切 提交于 2019-11-26 16:32:09

问题


I am having issue in reading data from azure blobs via spark streaming

JavaDStream<String> lines = ssc.textFileStream("hdfs://ip:8020/directory");

code like above works for HDFS, but is unable to read file from Azure blob

https://blobstorage.blob.core.windows.net/containerid/folder1/

Above is the path which is shown in azure UI, but this doesnt work, am i missing something, and how can we access it.

I know Eventhub are ideal choice for streaming data, but my current situation demands to use storage rather then queues


回答1:


In order to read data from blob storage, there are two things that need to be done. First, you need to tell Spark which native file system to use in the underlying Hadoop configuration. This means that you also need the Hadoop-Azure JAR to be available on your classpath (note there maybe runtime requirements for more JARs related to the Hadoop family):

JavaSparkContext ct = new JavaSparkContext();
Configuration config = ct.hadoopConfiguration();
config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");
config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");

Now, call onto the file using the wasb:// prefix (note the [s] is for optional secure connection):

ssc.textFileStream("wasb[s]://<BlobStorageContainerName>@<StorageAccountName>.blob.core.windows.net/<path>");

This goes without saying that you'll need to have proper permissions set from the location making the query to blob storage.




回答2:


As supplementary, there is a tutorial about HDFS-compatible Azure Blob storage with Hadoop which is very helpful, please see https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage.

Meanwhile, there is an offical sample on GitHub for Spark streaming on Azure. Unfortunately, the sample is written for Scala, but I think it's still helpful for you.



来源:https://stackoverflow.com/questions/37763472/reading-data-from-azure-blob-with-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!