Using SparkR and Sparklyr simultaneously

时光怂恿深爱的人放手 提交于 2019-12-18 08:48:02

问题


As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I therefore think that one currently needs to use both packages to get the full scope of functionality.

As both packages essentially wrap references to Java instances of scala classes, it should be possible to use the packages in parallel, I guess. But is it actually possible? What are your best practices?


回答1:


These two packages use different mechanisms and are not designed for interoperability. Their internals are designed in different ways, and don't expose JVM backend in the same manner.

While one could think of some solution that would allow for partial data sharing (using global temporary views comes to mind) with persistent metastore, it would have rather limited applications.

If you need both I'd recommend separating your pipeline into multiple steps, and passing data between these, using persistent storage.



来源:https://stackoverflow.com/questions/40577650/using-sparkr-and-sparklyr-simultaneously

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!