java: inconsistent watchdog timeout in systemd-notify

≯℡__Kan透↙ 提交于 2019-12-05 10:29:50

Here's the JNA code that solved the problem:

import com.sun.jna.Library;
import com.sun.jna.Native;

/**
 * The task issues a notification to the systemd watchdog. The systemd watchdog
 * will restart the service if the notification is not received.
 */

public class WatchdogNotifierTask implements Runnable {

private static final String SYSTEMD_SO = "systemd";
private static final String WATCHDOG_READY = "WATCHDOG=1";

@Override
public void run() {

  try {
    int returnCode = SystemD.INSTANCE.sd_notify(0, WATCHDOG_READY);
    if (returnCode < 0) {
      Log.MAIN_LOG.error(
          "Systemd watchdog returned a negative error code: " + Integer.toString(returnCode));
    } else {
      Log.MAIN_LOG.debug("Successfully updated systemd watchdog.");
    }
  } catch (Exception e) {
    Log.MAIN_LOG.error("calling sd_notify native code failed with exception: ", e);
  }
} 

/**
 * This is a linux-specific interface to load the systemd shared library and call the sd_notify
 * function. Should we need other systemd functionality, it can be loaded here. It uses JNA for
 * native library calls.
 *
 */
interface SystemD extends Library {
  SystemD INSTANCE = (SystemD) Native.loadLibrary(SYSTEMD_SO, SystemD.class);
  int sd_notify(int unset_environment, String state);
}

}
JdeBP

Anyone have any ideas why systemd-notify just doesn't work sometimes?

This is actually a long-standing problem in several systemd protocols, not just in the readiness notification protocol spoken by systemd-notify. The protocol for sending things directly to systemd's own journal also has this problem.

Both protocols attempt to find out stuff about the sending, client-end, process by reading things out of /proc/client-process-id/*. Unfortunately, systemd-notify is a short-lived program that exits as soon as it has sent the message to the server. So reading /proc/client-process-id/* does not yield the information about the client end that the server needs. In particular, the server cannot determine what (systemd) control group the client-end belongs to, and thus determine what service unit controls it, and thus determine whether it is a process that is allowed to send readiness notification messages.

As you have discovered, calling a library routine in-process in your actual dæmon, instead of forking a short-lived child process to run systemd-notify avoids this problem, because of course your dæmon does not immediately exit after sending the notification. Be aware, however, that if you issue a readiness notification immediately before exiting your daemon (as, ironically, some dæmons do in order to notify the world that they are terminating), you'll encounter this same problem even with an in-process library function.

There's no need to call a systemd library function as native code in order to speak this protocol, by the way. (And not using the library function gains you the advantage of speaking this protocol properly even if systemd isn't at the server end of it — a failing of the systemd library function.) It's not a hard protocol to speak in Java, and the systemd manual page describes the protocol. You look at an environment variable, open a datagram socket, use the variable's value for the name of the socket to send to, send a single datagram message, and then close the socket. Java is capable of this. ☺

Further reading

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!