GCP Dataproc - configure YARN fair scheduler

前端未结

关注

 1  1101

I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.

I have found this solutio

相关标签:

1条回答

逝去的感伤

2020-12-21 11:21

as init actions script is running after the cluster is created, the yarn service is already running in the time when the script modify the yarn-site.xml.

So after modifying the xml config file and creating the other xml file, the yarn service needs to be restarted. It can be done using this command:

sudo systemctl restart hadoop-yarn-resourcemanager.service

Also, since the $HADOOP_CONF_DIR was not set (I thought it should be), its needed to input the whole path to the file. But, after that, the initial YARN service won't start, because it can't find the file that is created later in init actions script. So, what I did is to add the last few lines to yarn-site.xml in the init actions script as well. The code for init actions script is the following:

ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
    echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
    echo "  <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
    echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml

    sed -i '$ d' /etc/hadoop/conf/yarn-site.xml

    echo "  <property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
    echo "    <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
    echo "  </property>" >> /etc/hadoop/conf/yarn-site.xml
    echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
    systemctl restart hadoop-yarn-resourcemanager.service
fi

0 讨论(0)