GCP Dataproc - configure YARN fair scheduler

前端 未结 1 1098
孤街浪徒
孤街浪徒 2020-12-21 10:55

I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.

I have found this solutio

相关标签:
1条回答
  • 2020-12-21 11:21

    as init actions script is running after the cluster is created, the yarn service is already running in the time when the script modify the yarn-site.xml.

    So after modifying the xml config file and creating the other xml file, the yarn service needs to be restarted. It can be done using this command:

    sudo systemctl restart hadoop-yarn-resourcemanager.service
    

    Also, since the $HADOOP_CONF_DIR was not set (I thought it should be), its needed to input the whole path to the file. But, after that, the initial YARN service won't start, because it can't find the file that is created later in init actions script. So, what I did is to add the last few lines to yarn-site.xml in the init actions script as well. The code for init actions script is the following:

    ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
    if [[ "${ROLE}" == 'Master' ]]; then
        echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
        echo "  <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
        echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml
    
        sed -i '$ d' /etc/hadoop/conf/yarn-site.xml
    
        echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
        echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
        echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
        echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
        echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
        systemctl restart hadoop-yarn-resourcemanager.service
    fi
    
    0 讨论(0)
提交回复
热议问题