Azure Databricks cluster init script - install python wheel

微笑、不失礼 提交于 2020-04-18 04:00:52

问题


I have a python script that mounts a storage account in databricks and then installs a wheel from the storage account. I am trying to run it as a cluster init script but it keeps failing. My script is of the form:

#/databricks/python/bin/python
mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)
dbutils.library.install("dbfs:/mnt/.....")
dbutils.library.restartPython()

It works when I run it in directly in a notebook but if I save to a file called dbfs:/databricks/init_scripts/datalakes/init.py and use it as cluster init script, the cluster fails to start and the error message says that the init script has a non-zero exit status. I've checked the logs and it appears that it is running as bash instead of python:

bash: line 1: mount_point: command not found

I have tried running the python script from a bash script called init.bash containing this one line:

/databricks/python/bin/python "dbfs:/databricks/init_scripts/datalakes/init.py"

Then the cluster using init.bash fails to start, with the logs saying it can't find the python file:

/databricks/python/bin/python: can't open file 'dbfs:/databricks/init_scripts/datalakes/init.py': [Errno 2] No such file or directory

Can anyone tell me how I could get this working please?

Related question: Azure Databricks cluster init script - Install wheel from mounted storage


回答1:


The solution I went with was to run a notebook which mounts the storage and creates a bash init script that just installs the wheel. Something like this:

mount_point = "/mnt/...."
configs = {....}
source = "...."
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(source = source, mount_point = mount_point, extra_configs = configs)

dbutils.fs.put("dbfs:/databricks/init_scripts/datalakes/init.bash",""" 
        /databricks/python/bin/pip install "../../../dbfs/mnt/package-source/parser-3.0-py3-none-any.whl"""", True)"


来源:https://stackoverflow.com/questions/61077447/azure-databricks-cluster-init-script-install-python-wheel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!