Test driving CRIU – Live Migrate any process on Linux

CRIU (Checkpoint and Restore In User-space) on Linux enables the users to backup and restore any live user-space process. This means that the process state can be frozen in time and stored as image files. These images can be used to restore the process.

Some interesting use cases that can be supported by CRIU are.

  • Process persistence across server reboot: Even after you have rebooted the server the image file can be used to restore the process
  • VMotion like Live migration for Processes: The image files can be copied over to another server and the process can be restored in the new server.

In this post, I will be exploring CRIU to checkpoint and restore a simple webserver process. We will also explore migration of the process across servers

Lets start by installing CRIU packages. I am using Ubuntu vivid and the package is available in the Ubuntu repository. We can use the following command to check and install CRIU

apt-cache search criu
apt-get install criu

The webserver process

For this experiment, I wanted a simple server process with an open network port and some internal state information to verify if CRIU can successfully restore the network connectivity as well as the state of the process.

I will start a simple webserver using this python script. This webserver keeps count of the request from the client and maintains an in memory list of all the previous requests it served. Create a new directory as shown below and change to this directory. Use wget to download the script ash shown below.

chandan@chandan-VirtualBox:~$ mkdir criu
chandan@chandan-VirtualBox:~$ cd criu
chandan@chandan-VirtualBox:~/criu$ wget https://bitbucket.org/api/2.0/snippets/xchandan/jge8x/9464e8e341c4c845aebf3a21e9d20e472baa4c5e/files/server.py

Now start the webserver from this directory by executing the following command

chandan@chandan-VirtualBox:~/criu$ python server.py 8181

Verify that the webserver is running by pointing your browser to http://localhost:8181 and refresh the page a few times to build the application’s internal state. Every refresh should increase the request number

You should see output similar to this.

CRIU1

Keep this process running and open another terminal. Use ps command to find the process id (PID) of the webserver

chandan@chandan-VirtualBox:~/criu/dump_dir$ ps aux|grep server.py
chandan 32601 0.0 0.1 40696 12760 pts/18   S+   20:40   0:00 python server.py 8181
chandan 32717 0.0 0.0   9492 2252 pts/1   S+   20:47   0:00 grep --color=auto server.py

We now have the PID of the webserver that is 32601

Checkpoint the webserver process

Checkpointing the webserver will freeze its process state and dump this state information into a directory. Make a new directory and go to the new directory.

Now execute the “criu dump -t <process id> –shell-job” command to checkpoint the process. Flag “–shell-job” is required if you want to use CRIU with processes directly started from a shell.

chandan@chandan-VirtualBox:~/criu/dump_dir$ sudo criu dump -t 32601 --shell-job

When the process exists, the directory will have many new files, which stores the state of the webserver process in the form of image files.

The dump command actually kills the webserver process; you can verify the same with the ps and grep command. This can also be verified by trying to browse the webserver address using your browser (which should fail).

CRIU2NOTE: with the CLI option “–leave-stopped” the dump command leaves the process in stopped state instead of killing it. This way the process can be restored in case a migration fails

Restoring the process

To restore the process go to the directory where the image files for the process are stored and execute the following command

chandan@chandan-VirtualBox:~/criu/dump_dir$ sudo criu restore --shell-job

This command will not return, as it is now the web server process. Keep this process running and verify that you can open the webserver URL.

You should see the output similar to this, the request count should continue from where it was before the application checkpoint was made. In this case we continue from Request No: 15 and all the state information is successfully restored as shown in the screenshot.

CRIU3

Restoring after machine reboot

You can now reboot your machine and again try to restore the webserver process. You should to able to restore the process and it should again continue from the check pointed request no.

Migrating webserver to another machine

process-migration

To migrate the webserver process we need an exact match of the runtime environment of the process on the target machine. This means the working directories, any resources like files, ports etc should be present on the target system. This is why process migration with CRIU will make more sense in a container based environment where the environment for the process can be closely controlled.

To start the migration, first copy the image files to the target machine

chandan@chandan-VirtualBox:~/criu$ scp -r dump_dir/ chandan@192.168.90.3:

Make sure that the environment for the process is present on the target machine, in my case i had to create the current working directory for the webserver after CRIU prompted with an error message.

chandan@chandan-ubuntu15:~/dump_dir$ sudo criu restore --shell-job 32601: Error (files-reg.c:1024): Can't open file home/chandan/criu on restore: No such file or directory
 32601: Error (files-reg.c:967): Can't open file home/chandan/criu: No such file or directory
 32601: Error (files.c:1070): Can't open cwd
Error (cr-restore.c:1185): 32601 exited, status=1
Error (cr-restore.c:1838): Restoring FAILED.
chandan@chandan-ubuntu15:~/dump_dir$ 
chandan@chandan-ubuntu15:~/dump_dir$ mkdir ~/criu
chandan@chandan-ubuntu15:~/dump_dir$ sudo criu restore --shell-job
192.168.90.2 - - [10/Aug/2015 01:26:47] &quot;GET / HTTP/1.1&quot; 200 -
192.168.90.2 - - [10/Aug/2015 01:26:47] code 404, message File not found
192.168.90.2 - - [10/Aug/2015 01:26:47] &quot;GET /favicon.ico HTTP/1.1&quot; 404 -
192.168.90.2 - - [10/Aug/2015 01:26:47] code 404, message File not found
192.168.90.2 - - [10/Aug/2015 01:26:47] &quot;GET /favicon.ico HTTP/1.1&quot; 404 -

Here is a screenshot of the webserver restored on a remote machine side-by-side of the local machine. You can see that both the processes start with the same internal state and continue on different path by looking at the state info for Request No 15.

CRIU4

Conclusion

In this post we saw how to checkpoint and restore any linux application. We could verify that the application could be restarted on a different server and its internal state can be restored.

In my future post I will explore using CRIU with containers to provide migration of containers

Update:

I found this interesting post describing live migration of LXD/LXC containers, and a demo video of the live migration of container  running the game Doom. Here is one more post about running a live migration of Docker container running Quake

Advertisements

Published by

Chandan Dutta Chowdhury

Software Engineer

2 thoughts on “Test driving CRIU – Live Migrate any process on Linux”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s