Checkpoint/restart using Core Dump in Linux

问题

Can Checkpoint/restart be implemented using the core dump of a process? The core file contains a complete memory dump of the process, thus in theory it should be possible to restore the process to the same state it was in when the core was dumped.

回答1:

No, this is not possible in general without special support from the kernel. The kernel maintains a LOT of per-process state, such as the file descriptor table, IPC objects, etc.

If you were willing to make lots of simplifying assumptions, such as no open files, no open sockets, no living IPC objects, no shared memory regions, and more, then in theory it would be possible, but in practice I don't believe it's possible with Linux even with those concessions.

回答2:

Yes, this is possible. GNU Emacs does this to optimize its startup time. It loads a bunch of Lisp files to produce an image and then dumps a core which can be restarted.

Several years ago, I created a patch for GNU Make 3.80 to do exactly the same thing (using code borrowed from GNU Emacs).

With this patch, you have a new option in make: make --dump. The utility now reads your Makefile, and then instead of executing the rules, it produces a core dump which can be restarted to do the actual build (evaluation of the parsed rule tree).

This was a saving, because the project was so large that loading all of the make rules across the source tree took thirty seconds! With this optimization, incremental builds launched almost instantly, without the half minute startup penalty.

No kernel support is required for this. What is required is knowledge about the structure of the core file.

In addition to this approach, there was a process checkpointing project for Linux many years ago (wonder what happened to that).

回答3:

As I commented, you could look for application checkpoint and use some libraries like Berkley Lab Checkpoint & Restart. However, these libraries don't use exactly a core(5) dump file, and have several limitations and conventions on what the checkpointing program can do, and what exactly is persistent in the checkpoint image. (open file descriptors and network sockets usually cannot be persisted).

Some Unix (and perhaps some patched Linux kernels) had limited checkpoint facilities in the kernel itself (in the 1980s Cray Unix had some).

回答4:

Debian has a number of packages you might want to look at :

blcr-util - Userspace tools to Checkpoint and Restart Linux processes

This is related to BLCR (Berkeley Lab Checkpoint/Restart) , see https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#whatisblcr

criu - checkpoint and restore in userspace https://criu.org/Main_Page

2.1 docker -supports checkpointing in recent versions, see https://criu.org/Docker

2.1. containerd - daemon to control runC

this contains a checkpointing facility that is interesting.

See also openvz that supports live migration: https://openvz.org/Checkpointing_and_live_migration

来源：https://stackoverflow.com/questions/16047636/checkpoint-restart-using-core-dump-in-linux

标签

Linux

coredump

checkpoint