Import a GitHub repo into Databricks community edition

故事扮演 提交于 2020-06-01 05:38:45

问题


I am trying to import some data from a public repo in GitHub so that to use it from my Databricks notebooks.

So far I tried to connect my Databricks account with my GitHub as described here, without results though since it seems that GitHub support comes with some non-community licensing. I get the following message when I try to set the GitHub token which is required for the GitHub integration:

The same question has been asked before on the official Databricks forum.

What is the best way to import and store a GitHub repo on databricks community edition?


回答1:


I managed to solve this using shell commands from the notebook itself. To retrieve the repository for the 1st time I did git clone via HTTPS:

%sh git clone https://github.com/SomeDataRepo/TheData.git --depth 1 --branch=master /dbfs/FileStore/TheData/

Why not SSH? Well SSH requires to setup the SSH keys which was not necessary in my case.

Finally, every time that I need a fresh version of the data I execute a git pull before executing my program:

%sh git -C /dbfs/FileStore/TheData/ pull



回答2:


assuming you have python installed on your desktop, install the databricks cli, clone the git repo to your local, then use the workspace cli to import the entire repo as a directory.

https://docs.databricks.com/dev-tools/cli/workspace-cli.html



来源:https://stackoverflow.com/questions/61078444/import-a-github-repo-into-databricks-community-edition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!