Databricks (Spark): .egg dependencies not installed automatically?

左心房为你撑大大i 提交于 2019-12-03 17:07:19

问题


I have a locally created .egg package that depends on boto==2.38.0. I used setuptools to create the build distribution. Everything works in my own local environment, as it fetches boto correctly from PiP. However on databricks it does not automatically fetch dependencies when I attach a library to the cluster.

I really struggled now for a few days trying to install a dependency automatically when loaded on databricks, I use setuptools; 'install_requires=['boto==2.38.0']' is the relevant field.

When I install boto directly from PyPi on the databricks server (so not relying on the install_requires field to work properly) and then call my own .egg, it does recognize that boto is a package, but it does not recognize any of its modules (since it is not imported on my own .egg's namespace???). So I cannot get my .egg to work. If this problem persists without having any solutions I'd think that is a really big problem for databricks users right now. There should be a solution of course...

Thank you!


回答1:


Your application's dependencies will not, in general, work properly if they are diverse and don't have uniform language support. The Databrick docs explain that

Databricks will install the correct version if the library supports both Python 2 and 3. If the library does not support Python 3 then library attachment will fail with an error.

In this case it will not automatically fetch dependencies when you attach a library to the cluster.



来源:https://stackoverflow.com/questions/32119225/databricks-spark-egg-dependencies-not-installed-automatically

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!