How can I read a XML file Azure Databricks Spark

问题

I was looking for some info on the MSDN forums but couldn't find a good forum/ While reading on the spark site I've the hint that here I would have better chances. So bottom line, I want to read a Blob storage where there is a contiguous feed of XML files, all small files, finaly we store these files in a Azure DW. Using Azure Databricks I can use Spark and python, but I can't find a way to 'read' the xml type. Some sample script used a library xml.etree.ElementTree but I can't get it imported.. So any help pushing me a a good direction is appreciated.

回答1:

One way is to use the databricks spark-xml library :

Import the spark-xml library into your workspace https://docs.databricks.com/user-guide/libraries.html#create-a-library (search spark-xml in the maven/spark package section and import it)
Attach the library to your cluster https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster
Use the following code in your notebook to read the xml file, where "note" is the root of my xml file.

xmldata = spark.read.format('xml').option("rootTag","note").load('dbfs:/mnt/mydatafolder/xmls/note.xml')

Example :

来源：https://stackoverflow.com/questions/52728741/how-can-i-read-a-xml-file-azure-databricks-spark

标签

azure

apache-spark

databricks

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!