Test-driving EVPN route publishing with GoBGP

In recent times there has been a lot of interest in tunnel based L2 networks, especially for Cloud Networks implemented with VXLAN.  The tunnel based networks were initially proposed with the idea of alleviating the 4k limit imposed with VLAN based networks.

EVPN based VXLAN tunneled networks use BGP as control plane for L2 learning. In this port we will test GoBGP for publishing L2 routes.

L2 learning in tunneled networks

The tunnel based networks need special handling of L2 learning. L2 learning consists of learning how to reach certain MAC (and IP address) in the network. In a normal VLAN based network this is done using the data path, i.e. when two IP addresses communicate in a VLAN the switch in the data path use the source MAC address of the packet to learn which port of the switch is connected a MAC address. This normally happens during the ARP resolution phase that converts the IP(L3) address to MAC(L2) address. It involves flooding all ports of the switch to query for the MAC address associated with the requested IP address.

In a tunnel based networks each of the endpoint form a full mesh of tunnels with all other endpoints.  Unfortunately, data path learning is a very costly option for tunnel based network as it requires flooding of all the tunnels.

Control Plane based L2 learning

The tunnel based networks instead implement a control plane based L2 learning. The control plane based learning work by proactively publishing the location of a MAC(L2) address on the overlay network to all the tunnel end-points.

To implement a control plane based learning someone with the knowledge of location of the MAC(L2) address on the overlay network must send this information to all the tunnel endpoints. This can be done by using a central controller which has a complete knowledge of the overlay network. This is how SDN controllers work, these controllers have the complete view of the network and can publish L2 reachability information to all tunnel endpoints

The EVPN solution uses a combination of data path and control plane learning. Each tunnel endpoint implements a local data path learning and publishes the learned L2 reachability information as L2 routes to the remote endpoints. It uses BGP to publish local and learn remote L2 addresses.

Publishing a L2 route

The L2 route consist of a MAC address (and optionally the associated IP address) and a next-hop IP address of the VTEP. This essentially means that any endpoint trying to send a packet to a MAC on the overlay network should look up its local L2 route table for the destination MAC and send the encapsulated L2 packet to the next hop VTEP IP.

Publishing L2 routes with GoBGP

Now that we have the theory of L2 route publishing cleared, let’s look at some real example of L2 routes published over BGP.

GoBGP is an open source BGP implementation which supports publishing EVPN routes. The following figure represents the test topology. It consists of three BGP peers connected in full mesh.  The VTEP IPs are used to create the BGP peers.

slide1

The Router details are as Follows(These are KVM instances started using libvirt and use the management IP for router-id)

Device VTEP-IP
Router1 192.168.122.3
Router2 192.168.122.87
Router3 192.168.122.174

To install GoBGP refer to the install documentation available at

https://github.com/osrg/gobgp/blob/master/docs/sources/getting-started.md

We are going to use the following configuration file to start BGP service Router1. Similar configuration files will be used to start the gobgpd instances on the other routers

[global.config]
  as = 64512
  router-id = "192.168.122.3"
[[neighbors]]
[neighbors.config]
  neighbor-address = "192.168.122.87"
  peer-as = 64512
[[neighbors.afi-safis]]
  [neighbors.afi-safis.config]
  afi-safi-name = "l2vpn-evpn"
[[neighbors]]
[neighbors.config]
  neighbor-address = "192.168.122.174"
  peer-as = 64512
[[neighbors.afi-safis]]
  [neighbors.afi-safis.config]
  afi-safi-name = "l2vpn-evpn"

To start the gobgpd instance use the following command

# gobgpd -f gobgpd.conf

gobgp0

Check that all the neighbors have established BGP connection and EVPN route are supported

gobgp1

To publish and view a L2 route use the following commands.

# gobgp global rib add macadv aa:bb:cc:dd:ee:04 2.2.2.4 1 1 rd 64512:10 rt 64512:10 encap vxlan -a evpn
# gobgp global rib -a evpn

Here is an example

Router3 >gobgp global rib add macadv aa:bb:cc:dd:ee:05 2.2.2.5 1 1 rd 64512:10 rt 64512:10 encap vxlan -a evpn
Router3 >gobgp global rib -a evpn
    Network                                                                                            Next Hop             AS_PATH              Age        Attrs
*>  [type:macadv][rd:64512:10][esi:single-homed][etag:1][mac:aa:bb:cc:dd:ee:05][ip:2.2.2.5][labels:[1]]0.0.0.0                                   00:00:02   [{Origin: ?} {Extcomms: [64512:10], [VXLAN]}]

The following screen shout captures the output

gobgp2.png

The above route tells the router to send any L2 packet destined to “aa:bb:cc:dd:ee:05” to the VTEP IP of Router3 over a VXLAN tunnel.

Let’s check that the L2 routes show up on the other Routers. The following route is learned through BGP on Router1.

Router1 >gobgp global rib -a evpn
    Network                                                                                            Next Hop             AS_PATH              Age        Attrs
*>  [type:macadv][rd:64512:10][esi:single-homed][etag:1][mac:aa:bb:cc:dd:ee:05][ip:2.2.2.5][labels:[1]]192.168.122.174                           00:00:38   [{Origin: ?} {LocalPref: 100} {Extcomms: [64512:10], [VXLAN]}]

The following screen shout captures the output

gobgp3

The route says: a  L2 packet destined for “aa:bb:cc:dd:ee:05” should be encapsulated as in  VXLAN tunnel and sent to  192.168.122.174(the VTEP IP of Router3).

Conclusion

From the above demonstration is can be seen how BGP based control plane is used by EVPN to distribute the L2 routes. This approach removes the need for a centralized controller which can become a single point of failure and bottleneck while scaling overlay networks

 

Advertisements

A simple metadata server to run cloud images on standalone libvirt :: KVM Hypervisor

With all the interest in Cloud Computing and virtualization, the OS vendors are providing ever more easier ways to deploy VMs. Most of them now come with cloud images. This makes it really easy for users to deploy VMs with the distro of their choice on a cloud platform like OpenStack or AWS.

Here are a few mentions of the available cloud images

https://cloud-images.ubuntu.com/

https://getfedora.org/en/cloud/download/

http://cloud.centos.org/centos/

Complete Lock-down

But, what about users wanting to deploy virtual machines on standalone Hypervisors? With so many pre-built images available, why should anyone need to install virtual machines from scratch?

With this thought in mind I decided to give these cloud images a try on my Hypervisor. I use libvirt with KVM on a physical server as my Hypervisor.

I realized quickly that these images are completely locked down, there is no default password, no web interface and most of then are configured with serial console.

Auto configuration deamon

These VM images are configured with a configuration daemon called cloudinit. When these cloud image based VM boots for the first time the cloudinit daemon tries to retrieve configuration information from various data sources and sets up things like password for the default user, hostname, SSH keys etc. A complete manual of cloudinit is available here

The cloudinit daemon can retrieve these configuration settings from sources like a ConfigDrives, CDROM attached to the VM(NoCloud) or from the network(EC2).

A Simple Solution

In my previous attempts to use a cloud image I followed the CDROM (attached ISO) based approach to provide the configuration data to the VMs. This quickly gets very cumbersome.

So the obvious thought was to try to mimic the metadata service provides by the cloud platform. The metadata service is a web service that provides the configuration data to the cloud images. As the cloud images boot, they send a DHCP request to get an IP address, then it contacts the metadata service on the network and try to retrieve the configuration for the VM. The metadata service is expected to be available at the well-known IP address of 169.254.169.254.

The libvirt configuration provides a default network to the VMs. It also provides a DHCP service to allocate IP to the VMs from a subnet pool of 192.168.122.0/24. All the VMs connect to the default network bridge; virbr0. This bridge also acts at the network Gateway and is configured with the IP 192.168.122.1. The topology is shown in the image below.

Slide1

With some trial and error I could come up with a Simple Metadata Service that can be used on a standalone libvirt/KVM based hypervisor to support booting cloud images. The metadata service is a python bottle app and you will need to install bottle web framework.

# pip install bottle

To make the metadata service work add the metadata IP to the bridge interface as follows:

 # ip addr add 169.254.169.254 dev virbr0

Download the server code from

https://bitbucket.org/xchandan/md_server

Build and Install the mdserver package

# python setup.py bdist_rpm 
# rpm -ivh dist/mdserver-<version>.noarch.rpm

or

# python setup.py install

Then start the metadata server as follows

# systemctl start mdserver

or

# mdserver /etc/mdserver/mdserver.conf

Setting password and ssh-key

You can set the password and ssh public key using the /etc/mdserver/mdserver.conf file. An example cam be found here (md_server/etc/mdserver/mdserver.conf.test in the source tree)

[mdserver]
password = password-test

[public-keys]
default = ssh-rsa ....

Note: The Hostname of the VM is set to the libvirt domain name. Although the mdserver can be run on ports other then 80, this is only for testing, the cloud images will always contact 169.254.169.254 on port 80 for metadata

With the metadata server running you should be able to start VMs with cloud images.

The simple metadata service is able to set the hostname, password for the default user, set SSH authorized key and enables password based SSH access.

I was able to test it with cirros, Ubuntu and Fedora images.

 

 

 

SSH Jump Host and Connection Multiplexing

Jump Hosts

While working with libvirt as my primary hypervisor to launch test VMs I need a way to connect to the VMs easily over SSH. As libvirt uses private network and SNAT for connecting the VMs to external world getting SSH access to the VMs requires Port Forwarding or DNAT.

I recently came to know about SSH Jump Host configuration. It which uses the SSH ProxyCommand to tunnel the SSH connection through intermediate hosts. I found it very useful to connect to my VMs hosted by Libvirt KVM on private network. Here is the command that I use to connect to the VMs

ssh -t -o ProxyCommand='ssh hypervisor_user@my-hypervisor1 nc vm1 22' vm_user@vm1

What is more amazing is SSH allows multiple intermediate Jump Hosts in the path.

Here is another trick taken from Gentoo wiki. Add the following configuration to your ssh config file at ~/.ssh/config

Host *+*
   ProxyCommand ssh $(echo %h | sed 's/+[^+]*$//;s/\([^+%%]*\)%%\([^+]*\)$/\2 -l \1/;s/:/ -p /') exec nc -w1 $(echo %h | sed 's/^.*+//;/:/!s/$/ %p/;s/:/ /')

with this config in place we can specify multiple intermediate jump hosts in the following format

ssh user1%host1:port1+user2%host2:port2+ host3:port3 -l user3

Connection Multiplexing

Connection multiplexing is a way to optimize creation of SSH connection between the client and server when frequent requests are made from the client to the server. Instead of creating a new SSH connection for each request and closing it down which incurs delays, it is easier to reuse an existing SSH connection.

ssh -M -S ~/.ssh/controlmasters/user1@server1:22 server1
ssh -S ~/.ssh/controlmasters/user1@server1:22 server1

It is easier to set this up with the ssh config file, here is an example:

 

Host Server1
       HostName server1
       ControlPath ~/.ssh/controlmasters/%r@%h:%p
       ControlMaster auto
       ControlPersist 10m

 

Ref: https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing

 

Test-driving multiboot on Raspberry Pi – without BerryBoot/Noobs

media-20160403.jpg

Recently I got a Raspberry Pi 3 board and wanted to try out various OS options on it. I realized quite quickly that to try a new OS I would need to  block copy (dd) the OS image to my SD card every time. I am running short on micro SD cards and it has a size limit too.

While I have a bunch of USB sticks lying around unused. So was thinking if the USB sticks could be used.

A little research on the Internet came up with 2 prominent options BerryBoot and Noobs. Both options allow multibooting your Pi board with different OS distros. While this is a good enough solution I wanted to know how things work internally and if there was a simple way to achieve multiboot  without using any tools (and more so for my learning purpose).

On the Internet there is a lot of information on how to install and boot Linux from USB sticks for Raspberry Pi. The process is summarized in the following section.

Run Pi with Linux from USB stick

In simple words, booting Linux involves loading the kernel, which initializes the hardware, and then mounting the root filesystem, which has all the user applications. Usually the kernel images are kept in the first partition and this partition is mounted on /boot directory.

Going through the Pi documentation, it looks like Pi boards recognize the SD card as the only boot device. So the trick to run Linux on Pi from USB stick involves installing the kernel images on the SD card while keeping the root file system on the USB stick and providing the information about the root filesystem location to the kernel in the boot command line.

If we look ate the space usage, a typical kernel image is only around 10mb in size. With all the data in the /boot directory it is still within 30Mb of space, while the root file system size can be much bigger based on the user application and data.

How Boot loading on Pi works

The boot process on Pi expects the SD card to have a FAT32 based first partition. To boot Linux, the kernel image must be present on this partition. For Pi0, Pi1 models the default kernel image file name is kernel.img  and for Pi2 and Pi3 models the default kernel image file is called kernel7.img . So the boot loader will look for the correct kernel image file for your model of Pi.

In addition to the kernel image, there are two configuration files, which are interesting to understand the booting process.

  • cmdline.txt
  • config.txt

The first file cmdline.txt configures the command line parameter passed while starting the boot process. This file is more close to the grub/syslinux command line

Following is an example of content of cmdline.txt

dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 elevator=deadline 
root=/dev/mmcblk0p2 rootfstype=ext4 fsck.repair=yes rootwait

The second file is the configuration file config.txt, which is the equivalent of bios settings for the Raspberry Pi SoC. Here is an example file content

gpu_mem=128 
disable_overscan=1

The documentation for all the options is available at https://www.raspberrypi.org/documentation/configuration/config-txt.md

Simple non-destructive way to tryout multiple OS on Pi

Now that we have a broad understanding of the booting process and the config files, lets looks at how we can use this to try different OS distributions for Pi3 using USB sticks.

If you look at the documentation for options supported by config.txt, you will notice that the kernel file name is configurable using the parameter “kernel”

So if we combine the USB booting mechanics of Pi we discussed in the previous section with the configurable kernel filename in config.txt we have a method to have multiple OS on different USB sticks.

And this is how to make it works…

Slide1.jpg

  • Use different USB sticks to hold the root filesystem for different OS distros. Normally all the Pi distros have a 2 partition based layout with the first partition being the FAT32 based boot partition, while the second is usually ext4 based root filesystem. So if you just dd the OS image on to the USB you root filesystem should be the second partition. Assuming this is the only usb stick attached to the Pi, the root partition should be recognized as /dev/sda2.
  • Format your SD card for a sufficiently big FAT32 based first partition
  • Store the kernel images for all the OS distros on this partition with different filenames. You can get the kernel images from the first partition of each of the USB sticks. If you attach both the SD card and the USB stick to a windows machine you should be able to just copy-paste the kernel images to the FAT32 partition on the SD card.
  • Update the cmdline.txt file to point to the root filesystem partition e.g. root=/dev/sda2
  • Finally update config.txt to point to the correct kernel filename. The one that you want to boot currently.

For this scheme to work you need to match the kernel filename for the distro configured in config.txt to the correct USB stick that you have attached to the Pi board.

NOTE: if you make a mistake just attached your SD card to another machine and edit the config.txt to fix it

The above processes can obviously be scripted to make it user-friendlier, but my purpose for the exercise was to get an understanding of the boot process on Pi and have some fun 🙂

Test-Driving OSPF on RouterOS – Interoperability

So I wrote about OSPF on RouterOS in my previous post. It was a nice experiment to learn about routing protocols.

I wanted to take it a little further and test Interoperability of RouterOS with other open source solutions.

This post is an update from the previous one and I will add OSPF neighbor nodes to the setup. I decided to use Quagga the most talked about open-source routing protocol suit and XORP the eXtensible Open Router Platform.

Updated Setup

The following is the updated setup for the Interoperability test. I have added two new Ubuntu nodes as OSPF neighbor.

  • Quagga on Ubuntu
  • XORP on Ubuntu

Slide3.jpg

Configuration

Quagga

The following configuration was added to Quagga node

Screenshot from 2016-03-27 12:33:55.png

XORP

The XORP node did not advertise any new subnet but received OSPF updates.

XORP_Conf.png

Results

  • All the nodes could discover their neighbors

Screenshot from 2016-03-27 00:03:27.png

  • All nodes got route updates.

Screenshot from 2016-03-27 01:54:34.png

  • OSPF Traces

Screenshot from 2016-03-27 01:57:34.png

Test-driving OSPF on RouterOS

I came across RouterOS by MikroTik© which provides advances routing protocol support. What is more amazing is they provide a RouterOS in a virtual form-factor called Cloud Hosted Router (CHR) that can be installed on hypervisors like KVM/VirtualBox/VMware.

Please look at licensing model at http://wiki.mikrotik.com/wiki/Manual:CHR#CHR_Licensing

This is perfect for learning purposes and experimenting at home. So I decided to test OSPF routing with Router OS.

The Setup

The following diagram describes my network setup. All for these are installed as VMs on my home desktop. Slide2

The footprint of the router VMs are quite small. MikroTik© recommends 128 MB RAM and 128 MB of HDD as minimal hardware requirements. I used virt-manager to setup the test network. Here is a typical VM configuration.

The actual setup however needs some hosts on the network to test the connectivity after implementing OSPF. To keep things lite weight I used NameSpaces to simulate hosts connected to the routers. Linux bridges were used to connect the routers and the hosts. The following figures show the final setup. Slide1

OSPF Configuration

For testing purpose I restricted my setup to area 0 to which both routers are connected. Following configuration is used on the routers.

Router1

/routing ospf instance
set [ find default=yes ] router-id=10.0.1.1
/ip address
add address=192.168.122.101/24 interface=ether1 network=192.168.122.0
add address=10.0.12.1/24 interface=ether2 network=10.0.12.0
add address=10.0.1.1 interface=loopback network=10.0.1.1
add address=10.10.0.1/24 interface=ether4 network=10.10.0.0
/routing ospf network
add area=backbone network=10.0.12.0/24
add area=backbone network=10.10.0.0/24
/system identity
set name=router1
[admin@router1] >

Router2

/routing ospf instance
set [ find default=yes ] router-id=10.0.2.1
/ip address
add address=192.168.122.102/24 interface=ether1 network=192.168.122.0
add address=10.0.12.2/24 interface=ether3 network=10.0.12.0
add address=10.20.0.1/24 interface=ether4 network=10.20.0.0
add address=10.0.2.1 interface=loopback network=10.0.2.1
/routing ospf network
add area=backbone network=10.0.12.0/24
add area=backbone network=10.20.0.0/24
/system identity
set name=router2
[admin@router2] >

Config-1

Results

I was able to get OSPF running with RouterOS in no time. Here are the test results.

  • Routing tables on the routers

OSPF-route

  • Routing tables on the hosts

HOST-route

  • Ping tests

PING

  • OSPF Traces

OSPF-ROS

Test driving OpenWRT

Recently I have been looking at tools for managing and monitoring my home network. In my previous post I talked about using a Network Namespace to control the download limit.

Now I wanted to look at more advanced tools for the job. OpenWRT is a Linux based firmware, which supports a lot of networking hardware. I am exploring the possibility of flashing OpenWRT on my backup router at home.

To test OpenWRT I used a KVM image (which can be found here) and started a VM on my desktop. The following diagram shows the network topology.

Slide1

Little tweaking is required for making OpenWRT work with libvirtd. The idea is to push the incoming traffic to OpenWRT and apply traffic monitoring/policy.

Libvirt provides dnsmasq service which listens on bridge virbr0 and provides DHCP ip to the VMs. It also configures NAT rules for traffic going out of the VMs through the virbr0.

  • For this test we will remove the NAT rules on the bridge virbr0. All applications on the desktop will communicate through this bridge to OpenWRT which will route the traffic to the Internet.
  • I also stopped the odhcpd and dnsmasq server running on OpenWRT. Started a dhsclient on the lan interface (br-lan) to request a IP from libvirtd.

Once OpenWRT is booted you can login to the web interface of the router to configure it.

The following figure shows the networking inside OpenWRT router.

Slide2

The routing table on my desktop is as followsScreenshot from 2016-03-06 20:34:41

The routing table on the OpenWRT server is show belowScreenshot from 2016-03-06 20:34:29

OpenWRT allows installation of extra packages to enhance its functionality.I could find packages like quagga, bird etc which will be interesting to explore.

Screenshot from 2016-03-06 17:51:13.png

It provides traffic monitoring and classifications.

Screenshot from 2016-03-06 19:41:27

Openwrt provider firewall configuration using iptables.

Screenshot from 2016-03-06 17:48:57

I will be exploring more of its features before deciding if I will flash it on my backup home router.

Rate Limiting ACT broadband on Ubuntu

ISPs have started to provide high bandwidth connections while the FUP (Fair Usage Policy) limit is still not enough (I am using ACT Broadband). Once you decide to be on youtube most of the time the download limit gets exhausted rather quickly.

As I use Ubuntu for my desktop, I decided to use TC to throttle my Internet bandwidth to bring in some control over my Internet bandwidth usage. Have a look at my previous posts about rate limiting and  traffic shaping on Linux to learn about usage of TC.

Here is my modest network setup at home.

Slide1

The problem is that TC can throttle traffic going out on an interface but traffic shaping will not impact the download bandwidth.

The Solution

To get around this problem I introduced a Linux network namespace into the topology. Here is how the topology looks now.

Slide2

I use this script to setup the upload/download bandwidth limit.

Results

Here are readings before and after applying the throttle

Before

media-20160302

After rate-limiting to 1024Kbps upload and download

media-20160302-1.png

Neutron Extension to add a console to Virtual Router

I have written a small extension to Neutron to add a console to virtual routers. This can help the tenant in understanding the networking setup and debugging. The console provides a very limited set of commands to be executed with in the virtual router (linux network namespace).

The demo shows a proof-of-concept of the idea, although the demo shows the console working with Linux network namespace, it can be easily adapted to other implementations.

The CLI of the console is very configurable.

Here is the video

Test driving LXD

Overview

LXD is an OS container (unlike docker which is application level container) and provides a more complete OS user space environment. It is a improved version of LXC containers. It provides image management, live migration of container instances, OpenStack integration.

LXD is composed of two components the server daemon called ‘lxd’ while the client is called ‘lxc ‘

Installation

The latest version of LXD can be installed from the ppa repository

sudo add-apt-repository ppa:ubuntu-lxc/lxd-git-master
sudo apt-get update
sudo apt-get install lxd

Image repositories

You will need a LXC image to start your container. To get a container image you have to add a LXC image repository location

sudo lxc remote add images images.linuxcontainers.org –debug
LXD1
sudo lxc remote list

LXD5

Next you can import the LXC image

sudo lxd-images import lxc ubuntu trusty amd64 --alias ubuntu --alias Ubuntu
sudo lxc image list

LXD2

Starting your Linux Container instance

sudo lxc launch ubuntu u1
sudo lxc list

LXD3
sudo lxc exec u1 /bin/bash

LXD4