Azure Databricks: How to add Spark configuration in Databricks cluster

后端 未结 1 1995
南方客
南方客 2020-12-21 06:51

I am using a Spark Databricks cluster and want to add a customized Spark configuration.
There is a Databricks documentation on this but I am not getting any clue how and

相关标签:
1条回答
  • 2020-12-21 07:10

    To fine tune Spark jobs, you can provide custom Spark configuration properties in a cluster configuration.

    1. On the cluster configuration page, click the Advanced Options toggle.
    2. Click the Spark tab.

    [OR]

    When you configure a cluster using the Clusters API, set Spark properties in the spark_conf field in the Create cluster request or Edit cluster request.

    To set Spark properties for all clusters, create a global init script:

    %scala
    dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
      |#!/bin/bash
      |
      |cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
      |[driver] {
      |  "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
      |}
      |EOF
      """.stripMargin, true)
    

    Reference: Databricks - Spark Configuration

    Example: You can pick any spark configuration you want to test, here I want to specify "spark.executor.memory 4g",and the custom configuration looks like this.

    After the cluster created, you can check out the result of custom configuration.

    Hope this helps.

    0 讨论(0)
提交回复
热议问题