Why “databricks-connect test” does not work after configurate Databricks Connect?

て烟熏妆下的殇ゞ 提交于 2019-12-11 03:11:47

问题


I wanna run my Spark processes directly in my cluster using IntelliJ IDEA, so I'm following the next documentation https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html

After configuring all, I run databricks-connect test but I'm not obtained the Scala REPL as the documentation says.

That is my cluster configuration


回答1:


You problem looks like it is one of the following: a) You specificed the wrong port (it has to be 8787 on Azure) b) You didnt open up the port in you Databricks Cluster c) You didnt install winUtils properly (e.g. you forgot to place the environment variable

If you can understand German by any chance, this youtube video might help you. (Shows the full installation process for windows 10).

https://www.youtube.com/watch?v=VTfRh_RFzOs&t=3s




回答2:


Your Python version should be 3.5 - as per the link you posted. Are you behind a proxy or a network that may have a layer 7 firewall? Everything else you have done looks correct. So I would try on another network.

Have you set:

spark.databricks.service.server.enabled true
spark.databricks.service.port 8787

IMPORTANT: I would rotate your API key - you have published your org id and key in the post meaning anyone can access it now.




回答3:


Try to run the databricks examples like :

from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.getOrCreate()

print("Testing simple count")
# The Spark code will execute on the Databricks cluster.
print(spark.range(100).count())

this worked for me.

Maybe they will fix databricks-connect test




回答4:


I solve the problem. The problem was the versions of all the tools:

  • Install Java

Download and install Java SE Runtime Version 8.

Download and install Java SE Development Kit 8.

  • Install Conda

You can either download and install full blown Anaconda or use miniconda.

  • Download WinUtils

This pesty bugger is part of Hadoop and required by Spark to work on Windows. Quick install, open Powershell (as an admin) and run (if you are on a corporate network with funky security you may need to download the exe manually):

New-Item -Path "C:\Hadoop\Bin" -ItemType Directory -Force
Invoke-WebRequest -Uri https://github.com/steveloughran/winutils/raw/master/hadoop-2.7.1/bin/winutils.exe -OutFile "C:\Hadoop\Bin\winutils.exe"
[Environment]::SetEnvironmentVariable("HADOOP_HOME", "C:\Hadoop", "Machine")
  • Create Virtual Environment

We are now a new Virtual Environment. I recommend creating one environment per project you are working on. This allow us to install different versions of Databricks-Connect per project and upgrade them separately.

From the Start menu find the Anaconda Prompt. When it opens it will have a default prompt of something like:

(base) C:\Users\User The base part means you are not in a virtual environment, rather the base install. To create a new environment execute this:

conda create --name dbconnect python=3.5

Where dbconnect is the name of your environment and can be what you want. Databricks currently runs Python 3.5 - your Python version must match. Again this is another good reason for having an environment per project as this may change in the future.

  • Now activate the environment:

    conda activate dbconnect

  • Install Databricks-Connect

You are now good to go:

pip install -U databricks-connect==5.3.*

databricks-connect configure

  • Create Databricks cluster (in this case I used Amazon Web Services)

spark.databricks.service.server.enabled true
spark.databricks.service.port 15001 (Amazon 15001, Azure 8787)
  • Turn Windows Defender Firewall Off or allow access.


来源:https://stackoverflow.com/questions/55951981/why-databricks-connect-test-does-not-work-after-configurate-databricks-connect

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!