Azure Databricks: How to add Spark configuration in Databricks cluster

余生颓废 提交于 2020-05-09 07:31:43

问题


I am using a Spark Databricks cluster and want to add a customized Spark configuration.
There is a Databricks documentation on this but I am not getting any clue how and what changes I should make. Can someone pls share the example to configure the Databricks cluster.
Is there any way to see the default configuration for Spark in the Databricks cluster.


回答1:


To fine tune Spark jobs, you can provide custom Spark configuration properties in a cluster configuration.

  1. On the cluster configuration page, click the Advanced Options toggle.
  2. Click the Spark tab.

[OR]

When you configure a cluster using the Clusters API, set Spark properties in the spark_conf field in the Create cluster request or Edit cluster request.

To set Spark properties for all clusters, create a global init script:

%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
  |#!/bin/bash
  |
  |cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
  |[driver] {
  |  "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
  |}
  |EOF
  """.stripMargin, true)

Reference: Databricks - Spark Configuration

Example: You can pick any spark configuration you want to test, here I want to specify "spark.executor.memory 4g",and the custom configuration looks like this.

After the cluster created, you can check out the result of custom configuration.

Hope this helps.



来源:https://stackoverflow.com/questions/58688544/azure-databricks-how-to-add-spark-configuration-in-databricks-cluster

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!