Control Plane for our virtual network

In the last two blogs, I have gone through the process of developing a VPN base virtual network. One thing that we ignored is the amount of configuration that we need to change to add or remove nodes or provision new edge routers.

While, some of these steps are part of the infrastructure provisioning, like connecting the routers over L3 links and the VPN tunnel setup. These can be considered as static one time activity, while others are very transient like adding a device to the network and adding another edge router.

To manage such a network manually is time consuming and error prone. In this blog, I will explore means to automate this activity.

Let’s talk about adding a device to our network

What does it take to add a new device to the network?

Slide4

In terms of provisioning

  • First we need to know the IP address associated to the device.
  • we need to add a host route to all the edge router (local and remote) to make the communication between this new device and the rest of the already present devices in the network.

The reverse needs to happen when a device is taken off the network. The host routes associated with the device needs to be removed.

How about adding an Edge Router?

Adding a new edge router is a little more involved. Let’s say we have to add a new edge router to an existing network and also add a new device to it (because that is mostly the only reason to add a new edge router).

In terms of provisioning

  • The new edge router must add host routes for all the devices existing in the network.
  • Then it must follow the same steps as described earlier to add the new device to the network.

Removing the edge router will mean removing all devices connected to the edge router. In most cases removing the edge router with devices connected may not be a valid case (but we can always think of system failure which can cause such a situation).

How to automate?

From the above discussion, we can see that the job of adding and removing devices and edge routers is made of two tasks.

  • Distributing information about the device joining and leaving the network. i.e. distributing host routes when the device is added to the network and retracting the route when the device leaves
  • Adding host routes to the edge routers. This step is more about pushing(programming) the route into the routing table of the edge router

This job description exactly fits the skills of a routing protocol 🙂

BGP as control plane

BGP well known for its ability to distribute routes and its scalability. Let’s explore how BGP can be used for distributing the routes for both the cases above. Fortunately, BGP is designed such a job and can already take care of cases like new edge router joining the network.

For our example, we will use GoBGP to distribute the host routes. All the edge routers run BGP and form peer with each other.

If you are interested in details, please look at my previous blogs exploring goBGP and how to publish routes.

Adding a device node must trigger a route publish on BGP.

Adding an edge router is automatically taken care by the new edge router forming BGP peering with rest of the edge routers and receiving all the existing host routes

Other Options: Using a Controller

The other option is to use a SDN controller which can listen for events like device joining /leaving or new edge routers joining the network and send commands to orchestrate the edge routers.

This is a more centralized and ground up approach, but it gives you the flexibility of designing your own event handlers.

We have already discussed about the events that needs to be handled in the previous section, I will not go in to the details of a controller design (not in this post :))

Programming the routes

The second task in automating the device join/leave was to programming the routes in the edge routers.  This can be done with agents running in the edge routers which monitors BGP or listens to the commands from the SDN controller and configures the routes.

Advertisements

Designing a L2 virtual network

In the previous blog, we saw how we can connect two devices in the same subnet over a L3 device. With some configuration to enable ARP Proxying and host routes on the L3 device we were able to simulate a L2 network over a L3 device.

The important thing to note from the previous blog is that we did not make any change on the device nodes. All changes were made to the router.

So, what can we achieve with this capability?

In this blog, I will be taking that experience to the next level by designing a virtual L2 network using routers.

But most importantly we will be using only common Linux utilities and kernel features to achieve this.

Connecting nodes across multiple routers

Let’s start with setting up a simple topology for this experiment. We have two routers each connecting 2 device nodes. The routers themselves are connected to each other using a L3 link.

Slide2

In a way, each router with nodes are replica of the setup from my first blogs. The end nodes are devices while the middle node are the routers.

Device Interface IP Address
Node_1 node_if0 20.0.0.1/24
Node_2 node_if0 20.0.0.2/24
Vrouter1 vr_if1

vr_if2

eth1

1.0.0.1/24

Node_3 node_if0 20.0.0.3/24
Node_4 node_if0 20.0.0.4/24
Vrouter2 vr_if1

vr_if2

eth1

1.0.0.2/24

Look at the commands from my previous blog to see how to bring up this topology.

Making it work

To make it work each router need to be configured with Proxy ARP (refer to previous blog). Each router must be configured with host routes for all the device nodes. This means we have host routes for devices directly connected to the router and also the ones that are connected via the second router.

Here is an example routing table listing with routes for directly connected devices.

vrouter1 > route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
1.0.0.0         0.0.0.0         255.255.255.0   U     0      0        0 eth1
20.0.0.1        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if1
20.0.0.2        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if2

Use the following commands to setup remote routes

vrouter1 > ip route add 20.0.0.3/32 nexthop via 1.0.0.2
vrouter1 > ip route add 20.0.0.4/32 nexthop via 1.0.0.2

Similar route configuration will be needed for vrouter2 to reach node_1(20.0.0.1) and node_2(20.0.0.2)

Caveat: I found that L2 learning did not work properly when I connected multiple routers. To rescue the situation, I added static ARP entries.

Manually creating the ARP entries for directly connected devices on the router. Use the following commands to setup the static ARP entries.

arp -s 20.0.0.${node_no} $node_mac

With the above taken care you should be able to ping nodes across the multiple routers.

node_1 > ping 20.0.0.3
PING 20.0.0.3 (20.0.0.3) 56(84) bytes of data.
64 bytes from 20.0.0.3: icmp_seq=1 ttl=62 time=437 ms
64 bytes from 20.0.0.3: icmp_seq=2 ttl=62 time=1.02 ms
64 bytes from 20.0.0.3: icmp_seq=3 ttl=62 time=0.947 ms
^C
--- 20.0.0.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.947/146.511/437.563/205.804 ms

L3 fabric as a transport

Now that we have a workable L2 network that can span across multiple routers, is there anything else required to make this solution work in a generic environment?

If you look closely, you will realize that although the routers in our experiment are connected to each other over a L3 link, the host routes in our end routers depend on the knowledge of exact path connecting them, e.g. if we introduce a third router in between them our solution will not work. As the remote host routes now needs to be adjusted to account for the extra hop that was introduced by the third router. Additionally, all the routers in the path connecting the end routers need to be aware of the host routes for the device nodes.

The VPN tunnels to rescue

The way to solve this situation is to setup tunnels between the routers connecting the device nodes. In this way, the L3 path is isolated from knowing the host routes and the Host routes are isolated from the impact of changes in the L3 path that connects the end routers.

We can setup GRE tunnels between the end routers, but we can also make the connections over SSH tunnels.

Use the following commands for setting up GRE

vrouter1 > ip tunnel add tun0 mode gre remote 1.0.0.2 local 1.0.0.1 ttl 255
vrouter1 > ifconfig tun0 2.0.0.1/24 up

Or

Use the SSH based tunnel as described at

http://man.openbsd.org/ssh#SSH-BASED_VIRTUAL_PRIVATE_NETWORKS

So now we have out L3 Virtual Private Network which isolates us from impact of the changes in the path. Make sure that you use the tunneled path for reaching the remote nodes. To do this adjust the host routs to use the tunnel paths

vrouter1 > ip route add 20.0.0.3/32 nexthop via 2.0.0.2
vrouter1 > ip route add 20.0.0.4/32 nexthop via 2.0.0.2

Putting it all together

Finally, this is how the complete topology looks.

Slide3

Here is a summary of the IP addresses and interfaces on each node.

 

Device Interface IP Address
Node_1 node_if0 20.0.0.1/24
Node_2 node_if0 20.0.0.2/24
Vrouter1 vr_if1

vr_if2

eth1

tun0

1.0.0.1/24

2.0.0.1/24

Node_3 node_if0 20.0.0.3/24
Node_4 node_if0 20.0.0.4/24
Vrouter2 vr_if1

vr_if2

eth1

tun0

1.0.0.2/24

2.0.0.2/24

The routing table with tunnel paths enabled looks like the following.

vrouter-routing-table

And the pings are still working 🙂

node_1 > ping 20.0.0.3
PING 20.0.0.3 (20.0.0.3) 56(84) bytes of data.
64 bytes from 20.0.0.3: icmp_seq=1 ttl=62 time=290 ms
64 bytes from 20.0.0.3: icmp_seq=2 ttl=62 time=1.01 ms
64 bytes from 20.0.0.3: icmp_seq=3 ttl=62 time=0.973 ms
64 bytes from 20.0.0.3: icmp_seq=4 ttl=62 time=1.17 ms
^C
--- 20.0.0.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.973/73.499/290.832/125.477 ms

 

Using a Router for Switching

We all know that Routers are Layer3 devices and switches are Layer2. So how can a Layer 3 device be used to connect two or more devices at Layer2 ?

In this blog, I will explore the mechanisms that make it possible to use a routing device as a switch.

The test setup

We start with a simple setup with three Linux Network Namespaces connected in series. The middle node acting as a Router and the end nodes acting as the devices that needs a layer 2 connectivity.

Use the following script for setting up the Namespaces and connections.

https://bitbucket.org/!api/2.0/snippets/xchandan/Bazz4/0a29fd79dc2ad2be4c816f8db788abfed3ac9282/files/router_topo_step1.sh

Configuration

The script adds the following IP addresses to the device nodes.

Device Interface IP Address
Node_1 node_if0 20.0.0.1/24
Node_2 node_if0 20.0.0.2/24
Vrouter1 vr_if1

vr_if2

None

None

In this setup, we have two Layer2 broadcast domain A and B connected using the router.

Slide1

It doesn’t work, testing what’s is going on

Now we have two devices in the same Subnet connected using a Router instead of a switch. Start a ping form one device node to the other and predictably this does not work. But running tcpdump can give you an insight on to the traffic flow.

node_1 > ping 20.0.0.2
PING 20.0.0.2 (20.0.0.2) 56(84) bytes of data.
From 20.0.0.1 icmp_seq=1 Destination Host Unreachable
From 20.0.0.1 icmp_seq=2 Destination Host Unreachable
From 20.0.0.1 icmp_seq=3 Destination Host Unreachable
^C
--- 20.0.0.2 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4024ms

And following is the tcpdump output

vrouter1 > sudo tcpdump -i vr_if1 -n
tcpdump: WARNING: vr_if1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vr_if1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
13:29:08.265464 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:09.263145 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:10.263058 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:11.280501 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:12.278951 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28

The ARP requests are not resolving.

As the device nodes are part of the same subnet, when we try to ping one of the device node from another, the first device node sends ARP queries to find the Layer2 address for the destination IP. ARP uses Layer2 broadcast to for finding the MAC address of the destination. As our destination is not part of the same Layer2 domain the ARP query broadcast will never reach it and will never be answered.

The missing link that will make it work

We saw in the previous section the problem of unresolved ARP queries.

So how do we rescue this situation?

To make this setup work we will need to enable Proxy ARP on the Router. When Proxy ARP is enabled on the router interfaces, the interface starts responding to the ARP query requests from the device nodes with its own MAC (Layer2) address. In a way, it proxies for a device(with the destination IP) which is actually not present on the Layer2 network.

To enable proxy ARP on the router, use the following commands.

vrouter1 > echo 1 > /proc/sys/net/ipv4/conf/vr_if1/proxy_arp
vrouter1 > echo 1 > /proc/sys/net/ipv4/conf/vr_if2/proxy_arp

Taste of success

So now the ARP problem is resolved. The device node gets a reply of its ARP query with a MAC address which actually belongs to the interface of the router. The device now knows what MAC address associated to the destination IP.

But the pings are still failing.

This is because we have solved only half the problem.

What does the Router do with the packet that it received? Once the Layer2 frame arrives on the router, it removes the Layer2 header and looks at the IP address. It obviously does not hold the destination IP.

So, the router does what the router does with any other IP packet, looks up the routing table.

To make the ping work, we will have to add host routes to the destination IP address on the router.

The following commands will achieve our goal.

vrouter1 > ip route add 20.0.0.1/32 dev vr_if1
vrouter1 > ip route add 20.0.0.2/32 dev vr_if2

This is how the routing table on the router looks like

vrouter1 > route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
20.0.0.1        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if1
20.0.0.2        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if2

One more thing to keep in mind. Don’t forget to enable IP forwarding on your router.

vrouter1 > echo 1 > /proc/sys/net/ipv4/ip_forward

Finally, we can test the end to end traffic flow 🙂

 

node_1 > ping 20.0.0.2
PING 20.0.0.2 (20.0.0.2) 56(84) bytes of data.
64 bytes from 20.0.0.2: icmp_seq=1 ttl=63 time=0.167 ms
64 bytes from 20.0.0.2: icmp_seq=2 ttl=63 time=0.061 ms
^C
--- 20.0.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.061/0.114/0.167/0.053 ms