How to add spark packages to Spark R notebook on DSX?

橙三吉。 提交于 2019-12-24 10:56:46

问题


The spark documentation shows how a spark package can be added:

sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")

I believe this can only be used when initialising the session.

How can we add spark packages for SparkR using a notebook on DSX?


回答1:


Please use pixiedust package manager to install the avro package.

pixiedust.installPackage("com.databricks:spark-avro_2.11:3.0.0")

http://datascience.ibm.com/docs/content/analyze-data/Package-Manager.html

Install it from python 1.6 kernel since pixiedust is importable in python.(Remember this is install at your spark instance level). Once you install it , restart the kernel and then switch to R kernel and then read the avro like this:-

df1 <- read.df("episodes.avro", source = "com.databricks.spark.avro", header = "true")

head(df1)

Complete Notebook:-

https://github.com/charles2588/bluemixsparknotebooks/raw/master/R/sparkRPackageTest.ipynb

Thanks, Charles.



来源:https://stackoverflow.com/questions/42279520/how-to-add-spark-packages-to-spark-r-notebook-on-dsx

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!