Implementing Basic Networking Constructs with Linux Namespaces

In this post, I will explain the use of Linux network namespace to implement basic networking constructs like a L2 switching network and Routed network.

Lets start by looking at the basic commands to create, delete and list network namespaces on Linux.

The next step is to create a LAN, we will use namespaces to simulate two nodes connected to a bridge and simulate a LAN inside the Linux host. We will implement a topology like the one shown below

lan

Finally, let see how to simulate a router to connect two LAN segments. We will implement the simplest of the topology with just two nodes connected to a router on different LAN segments

router

Advertisements

Test driving traffic shaping on Linux

In my last post, I shared a simple setup that does bandwidth limiting on Linux using TBF (Token Bucket Filter). The TBF based approach applies a bandwidth throttle on the NIC as a whole.

The situation in reality might be more complex then what the post described. Normally the users like to control bandwidth based on the type of application generating the traffic.

Lets take a simple example; the user may like to allow his bandwidth to be shared by application traffic as follows

  • 50% bandwidth available to web traffic
  • 30% available to mail service
  • 20% available for rest of the application

Traffic Control on Linux provides ways to achieve this using classful queuing discipline.

In essence, this type of traffic control can be achieved by first classify the traffic in to different classes and applying traffic shaping rules to each of those classes.

The Hierarchical Token Bucket Queuing Discipline

Although Linux provides various classful queuing discipline, in this post, we will look at Hierarchical Token Bucket (HTB) to solve the bandwidth sharing and control problem.

HTB is actually a Hierarchy of TBF (Token Bucket filter we described in the last post) applied to a network interface. HTB works by classifying the traffic and applying TBF to individual class of traffic. So to understand HTB we must understand Token Bucket Filer first.

How Token Bucket Filter works

Lets get a deeper look at how the Token Bucket Filter (TBF) works.

The TBF works by having a Bucket of tokens attached to the network interface and each time a packet needs to be passed over the network interface a token is consumed from the Bucket. The kernel produces these tokens at a fixed rate to fill-up the bucket.

When the traffic is flowing at a slower pace then the rate of token generation the bucket will fill up. Once filled up the bucket will reject all the extra tokens generated by the kernel. The tokens accumulated in the bucket can help in passing a burst of traffic (limited by the size of the bucket) over the interface.

When the traffic is flowing at a pace higher then the rate of token generation the packet must wait until the next token is available in the bucket before being allowed to pass over the network interface.

TBF

In the tc command line the size of the bucket is related to burst size, rate of token generation is related to rate and the latency parameter provides the amount of time a packet can be in the queue waiting for a token before being dropped.

The HTB queuing discipline

The following figure describes the working of HTB queuing discipline.

HTB

To apply HTB discipline we have to go through the following steps

  • Define different classes with their rate limiting attributes
  • Add rules to classify the traffic in to different classes

In this example we will try to implement the same traffic sharing requirement as mentioned in the introduction section. Web traffic will get 50% of the bandwidth while mail gets 30% and 20% is shared by all other traffic.

The following rules define the various classes with the traffic limits

tc qdisc add dev eth0 root handle 1: htb default 30
tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 50kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 30kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 20mbps ceil 100kbps

Slide1

Now we must classify the traffic into their classes based on some match condition. The following rules classify the web and mail traffic in to class 10 and 20. All other traffic are pushed to class 30 by default

tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 80 0xffff flowid 1:10 
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 25 0xffff flowid 1:20

Verifying the results

We will use iperf to verify the results of the traffic control changes. Use the following command to start 2 instances of iperf on the server

iperf -s -p <port number> -i 1

For our example, we use the following commands to start the servers

iperf -s -p 25 -i 1
iperf -s -p 80 -i 1

The clients are started with the following commands

iperf –c <server ip> -p <port number> -t <time period to run the test>

For this example we used the following commands to run the test for 60 secs. The server IP of 192.168.90.4 was used.

iperf -c 192.168.90.4 -p 25 -t 60
iperf -c 192.168.90.4 -p 25 -t 60

Here is a snapshot of output from the test.

HTB

It show the web traffic to be close to 500kbps while mail traffic to be close to 300kbps, the same ration we wanted to shape the traff. When excess bandwidth is available HTB will split it in the same ratio that we configured for the classes.

Network Bandwidth Limiting on Linux with TC

On Linux Traffic Queuing Discipline attached to a NIC can be used to shape the outgoing bandwidth. By default, Linux uses pfifo_fast as the queuing discipline. Use the following command to verify the setting on your network card

# tc qdisc show dev eth0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Measuring the default bandwidth

I am using iperf tool to measure the bandwidth between my VirtualBox instance (10.0.2.15) and my Desktop (192.168.90.4) acting as the iperf server.

Start the iperf server with the following command

iperf -s

The client can then connect using the following command

iperf -c <server address>

The following figure shows the bandwidth available with default queue setting

TBF1

Limiting Traffic with TC

We will use Token Bucket Filter to throttle the outgoing traffic. The following command sets an egress rate of 1024kbit at a latency of 50ms and a burst rate of 1540

# tc qdisc add dev eth0 root tbf rate 1024kbit latency 50ms burst 1540

Use the tc qdisc show command to verify the setting

# tc qdisc show dev eth0
tc qdisc add dev eth0 root tbf rate 1024kbit latency 50ms burst 1540

Verifying the result

I measured the bandwidth again to make sure the new queuing configuration is working and sure enough, the result from iperf confirmed it.

TBF2

The following command shows the detailed statistics of the queuing discipline

TBF3Impact of different parameters for Token Bucket Filter (TBF)

Decreasing the latency number leads to packet drops, follow figure captures the result after latency was dropped to 1ms.

TBF5

Providing a big burst buffer defeats the rate limiting

TBF4TBF6

Both the parameters needs to be chosen properly to avoid packet loss and spike in traffic beyond the rate limit

Test driving CRIU – Live Migrate any process on Linux

CRIU (Checkpoint and Restore In User-space) on Linux enables the users to backup and restore any live user-space process. This means that the process state can be frozen in time and stored as image files. These images can be used to restore the process.

Some interesting use cases that can be supported by CRIU are.

  • Process persistence across server reboot: Even after you have rebooted the server the image file can be used to restore the process
  • VMotion like Live migration for Processes: The image files can be copied over to another server and the process can be restored in the new server.

In this post, I will be exploring CRIU to checkpoint and restore a simple webserver process. We will also explore migration of the process across servers

Lets start by installing CRIU packages. I am using Ubuntu vivid and the package is available in the Ubuntu repository. We can use the following command to check and install CRIU

apt-cache search criu
apt-get install criu

The webserver process

For this experiment, I wanted a simple server process with an open network port and some internal state information to verify if CRIU can successfully restore the network connectivity as well as the state of the process.

I will start a simple webserver using this python script. This webserver keeps count of the request from the client and maintains an in memory list of all the previous requests it served. Create a new directory as shown below and change to this directory. Use wget to download the script ash shown below.

chandan@chandan-VirtualBox:~$ mkdir criu
chandan@chandan-VirtualBox:~$ cd criu
chandan@chandan-VirtualBox:~/criu$ wget https://bitbucket.org/api/2.0/snippets/xchandan/jge8x/9464e8e341c4c845aebf3a21e9d20e472baa4c5e/files/server.py

Now start the webserver from this directory by executing the following command

chandan@chandan-VirtualBox:~/criu$ python server.py 8181

Verify that the webserver is running by pointing your browser to http://localhost:8181 and refresh the page a few times to build the application’s internal state. Every refresh should increase the request number

You should see output similar to this.

CRIU1

Keep this process running and open another terminal. Use ps command to find the process id (PID) of the webserver

chandan@chandan-VirtualBox:~/criu/dump_dir$ ps aux|grep server.py
chandan 32601 0.0 0.1 40696 12760 pts/18   S+   20:40   0:00 python server.py 8181
chandan 32717 0.0 0.0   9492 2252 pts/1   S+   20:47   0:00 grep --color=auto server.py

We now have the PID of the webserver that is 32601

Checkpoint the webserver process

Checkpointing the webserver will freeze its process state and dump this state information into a directory. Make a new directory and go to the new directory.

Now execute the “criu dump -t <process id> –shell-job” command to checkpoint the process. Flag “–shell-job” is required if you want to use CRIU with processes directly started from a shell.

chandan@chandan-VirtualBox:~/criu/dump_dir$ sudo criu dump -t 32601 --shell-job

When the process exists, the directory will have many new files, which stores the state of the webserver process in the form of image files.

The dump command actually kills the webserver process; you can verify the same with the ps and grep command. This can also be verified by trying to browse the webserver address using your browser (which should fail).

CRIU2NOTE: with the CLI option “–leave-stopped” the dump command leaves the process in stopped state instead of killing it. This way the process can be restored in case a migration fails

Restoring the process

To restore the process go to the directory where the image files for the process are stored and execute the following command

chandan@chandan-VirtualBox:~/criu/dump_dir$ sudo criu restore --shell-job

This command will not return, as it is now the web server process. Keep this process running and verify that you can open the webserver URL.

You should see the output similar to this, the request count should continue from where it was before the application checkpoint was made. In this case we continue from Request No: 15 and all the state information is successfully restored as shown in the screenshot.

CRIU3

Restoring after machine reboot

You can now reboot your machine and again try to restore the webserver process. You should to able to restore the process and it should again continue from the check pointed request no.

Migrating webserver to another machine

process-migration

To migrate the webserver process we need an exact match of the runtime environment of the process on the target machine. This means the working directories, any resources like files, ports etc should be present on the target system. This is why process migration with CRIU will make more sense in a container based environment where the environment for the process can be closely controlled.

To start the migration, first copy the image files to the target machine

chandan@chandan-VirtualBox:~/criu$ scp -r dump_dir/ chandan@192.168.90.3:

Make sure that the environment for the process is present on the target machine, in my case i had to create the current working directory for the webserver after CRIU prompted with an error message.

chandan@chandan-ubuntu15:~/dump_dir$ sudo criu restore --shell-job 32601: Error (files-reg.c:1024): Can't open file home/chandan/criu on restore: No such file or directory
 32601: Error (files-reg.c:967): Can't open file home/chandan/criu: No such file or directory
 32601: Error (files.c:1070): Can't open cwd
Error (cr-restore.c:1185): 32601 exited, status=1
Error (cr-restore.c:1838): Restoring FAILED.
chandan@chandan-ubuntu15:~/dump_dir$ 
chandan@chandan-ubuntu15:~/dump_dir$ mkdir ~/criu
chandan@chandan-ubuntu15:~/dump_dir$ sudo criu restore --shell-job
192.168.90.2 - - [10/Aug/2015 01:26:47] &quot;GET / HTTP/1.1&quot; 200 -
192.168.90.2 - - [10/Aug/2015 01:26:47] code 404, message File not found
192.168.90.2 - - [10/Aug/2015 01:26:47] &quot;GET /favicon.ico HTTP/1.1&quot; 404 -
192.168.90.2 - - [10/Aug/2015 01:26:47] code 404, message File not found
192.168.90.2 - - [10/Aug/2015 01:26:47] &quot;GET /favicon.ico HTTP/1.1&quot; 404 -

Here is a screenshot of the webserver restored on a remote machine side-by-side of the local machine. You can see that both the processes start with the same internal state and continue on different path by looking at the state info for Request No 15.

CRIU4

Conclusion

In this post we saw how to checkpoint and restore any linux application. We could verify that the application could be restarted on a different server and its internal state can be restored.

In my future post I will explore using CRIU with containers to provide migration of containers

Update:

I found this interesting post describing live migration of LXD/LXC containers, and a demo video of the live migration of container  running the game Doom. Here is one more post about running a live migration of Docker container running Quake

Test driving Policy Based Routing on Linux

Policy Based Routing allows the network admin to apply different forwarding  rules to a packets based on some of its attributes. For example source address, QoS or firewall marks can be used as a packet classification policy.

Policy routing can help in meeting the requirements of differentiated services like QoS, security Policy etc where the traffic from customer paying for a better or more secure service is allowed to use a different network route that the regular traffic.

On Linux Policy Based Routing can be enabled in the following manner

  • The first step is to classify the traffic into different categories based of conditions like source address, destination address, QoS or firewall mark etc.
  • Then use different routing tables to forward traffic from different categories that was created in the first step

Slide2

In this post we will setup a Linux based router with policy routing. For the purpose of our experiment, we will implement Policy based routing using source address as the classifier. The following topology describes the setup for our test

Slide1

Topology:

  • We have 4 networks connecting host1, host2 and server to the router
Network Address Gateway (Router) Host Address
Net1 10.0.10.0/24 10.0.10.1 (router-eth1) 10.0.10.2 (host1-eth0)
Net2 10.0.11.0/24 10.0.11.1 (router-eth2) 10.0.11.2 (host2-eth0)
Net3 10.0.12.0/24 10.0.12.1 (router-eth3) 10.0.12.2 (server-eth0)
Net4 10.0.13.0/24 10.0.13.1 (router-eth4) 10.0.13.2 (server-eth1)
  • The server holds the IP 11.0.14.1/24, which is the destination the hosts are trying to reach.
  • The router has two paths to forward the packets arriving from host1 and host2 to destination 11.0.14.1, either using the 10.0.12.0/24 or 10.0.13.0/24 network.
  • To begin with the server and the router is configured with static routes to use only one link (router-eth3 <-> server-eth0 i.e. 10.0.12.0/24 network) to forward traffic between router and server

pr4

The topology is implemented using mininet network simulator. The code for the topology is available here.

Start the simulated topology using the following command(using the policy_routing_disabled.py file from the repository).

sudo python policy_routing_disabled.py

We will start with a topology with disabled policy routing and learn the steps need to enable policy routing. (if you want a ready made mininet topology with policy routing look at policy_routing_enabled.py file that implements this)

The mininet configuration implements the hosts, server nodes as network namespaces, and the router is implemented by using the Linux host itself.

Once the topology is ready try to ping the IP 11.0.14.1 located on the server from the hosts.

Note: In the mininet CLI, a command can be started on a node by providing the node name followed by the command like in the form “<node_name> <command> <argument1> <argument2> …”

mininet> host1 ping 11.0.14.1
mininet> host2 ping 11.0.14.1

Without the policy routing in place the packets are forwarded using one of the path(10.0.12.0/24 network, net3). You can verify this by running the following command on the router (as the Linux host is acting as the router the following command can be started on another terminal).

$ watch "egrep router-eth.* /proc/net/dev"

pr5

Note the interface statistics of router-eth4 does not change when ping packets are transmitted either from host1 or host2 to server.

Lets remove the static route setting on the router and modify the one on server as shown below


mininet> router route del -net 11.0.14.0/24 dev router-eth3
mininet> server ip route del 10.0.11.0/24 via 10.0.12.1
mininet> server ip route add 10.0.11.0/24 via 10.0.13.1

We can now proceed to enable policy routing.

Enabling policy routing

Lets now start configuring the policy routing

  1. Classify the traffic arriving on the router based on source network to use different routing table. The following commands add classification policy on the router, which in this case is the source network.
mininet> router ip rule add from 10.0.10.0/24 lookup net1_table
mininet> router ip rule add from 10.0.11.0/24 lookup net2_table
  1. Add the routing tables
mininet> router echo "101 net1_table">> /etc/iproute2/rt_tables
mininet> router echo "102 net2_table">> /etc/iproute2/rt_tables
  1. Add routes to the individual table. The following routes add different path to reach destination network 11.0.14.0/24
mininet> router ip route add 11.0.14.0/24 via 10.0.12.2 table net1_table
mininet> router ip route add 11.0.14.0/24 via 10.0.13.2 table net2_table

With the above settings in place, the routing rules from net1_table are applied to packets arriving from network 10.0.10.0/24 (net1) and  rules from net2_table are applied to packets arriving from network 10.0.11.0/24 (net2).

Verifying the result

Use the following command on the mininet prompt to start a xterm connected to router.

mininet> xterm router

Use the xterm to start a watch command on all interfaces of the router

(router xterm) $ watch "egrep router-eth.* /proc/net/dev"

Now start a ping from host1 to 11.0.14.1 and check the output to verify that the packets are forwarded using eth3 interface on the router

mininet> host1 ping 11.0.14.1

The same steps can be used to verify that if the ping is started from host2 to 10.0.14.1 eth4 interface on the router

Here are the example output.

Initial stats:

pr1

Send 10 Ping packets from Host1:

pr2

Send 10 Ping packets from Host2:

pr3

The results show that the traffic from host1 which arrives on the router interface router-eth1 is forwarded using 10.0.12.0/24 network through router interface router-eth3 and traffic from host2 which arrives on the router interface router-eth2 is forwarded using 10.0.13.0/24 network through router interface router-eth4

The following figure shows the change in traffic flow due to policy based routing configuration

polocy-routing

The final settings with policy routing enabled is also available as a mininet configuration as policy_routing_enabled.py in the repository.

Conclusion

In this experiment we started with a setup where the router had two paths to the destination server but was using only one to forward traffic. With the help of policy routing we were able to utilize both the path to communicate with the server.

The traffic was distributed based on the source address. i.e. packets coming from net1(10.0.10.0/24) is forwarded over net3(10.0.12.0/24) while that from net2(10.0.11.0/24) is forwarded over net4(10.0.13.0/24).