I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.
I have found this solutio
as init actions script is running after the cluster is created, the yarn service is already running in the time when the script modify the yarn-site.xml.
So after modifying the xml config file and creating the other xml file, the yarn service needs to be restarted. It can be done using this command:
sudo systemctl restart hadoop-yarn-resourcemanager.service
Also, since the $HADOOP_CONF_DIR was not set (I thought it should be), its needed to input the whole path to the file. But, after that, the initial YARN service won't start, because it can't find the file that is created later in init actions script. So, what I did is to add the last few lines to yarn-site.xml in the init actions script as well. The code for init actions script is the following:
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
echo " <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml
sed -i '$ d' /etc/hadoop/conf/yarn-site.xml
echo " <property>" >> /etc/hadoop/conf/yarn-site.xml
echo " <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
echo " <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
echo " </property>" >> /etc/hadoop/conf/yarn-site.xml
echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
systemctl restart hadoop-yarn-resourcemanager.service
fi