Extending my Private LAN behind NAT to AWS

WordPress on EC2 with DB in private LAN

We have all known and talked about a hybrid cloud solution to spill over an On-Prem infrastructure to cloud providers like AWS. Recently I had a need to do (a POC that needed) the same for my private lab where part of my software runs on the cloud while still communicating with the in house components.

The standard solution for this is to setup a VPN connection between your private lab and AWS. Only  problem is that I don’t have a public IP available to setup the tunnels and also I am behind the ISPs NAT. So the VPN solution does not work for my case.

Hence I used SSH to setup the poor mans version of VPN tunnels.

The setup in detail

Basic idea is that you have a pair of tunnel host setup on the VPC and your private lab as seen in the figure below and use the tunnel to route traffic between the networks.

Tunnel

Setting it up

Following steps will bring up the setup.

  • First start the pair of tunnel host one on the the Private LAN (TH1) another on VPC with a Public IP (TH2). These hosts will act as the tunnel gateway. I used Ubuntu based VMs for the tunnel hosts.

In our case the tunnel host on the private LAN will act as SSH client to initiate the tunnel connection.

  • Update the SSH configuration file /etc/ssh/sshd_config on the server side (TH2) to allow creation of tunnels. 
PermitTunnel yes

Note: SSH sessions are terminated if there are no activity for a certain amount of time. I found some help full suggestion to keep the connection alive, you can also read about it the sshd_config man page. Here is the gist of it

Update your SSH server (TH2) configuration with the following options

ClientAliveInterval 120
ClientAliveCountMax 720

Update Client (TH1) config  ~/.ssh/ssh_config with the following

ServerAliveInterval 120
  • Next setup the tunnels using SSH connections.

As the tunnels are setup using SSH connection so  having a Public IP (TH2_Public_IP) on the AWS side is enough.  Client side can reside behind the ISP provided NAT with no need of public IP. SSH connections are setup as root user.

 On the client(TH1):
     # ssh -i AWS_SSH_KEY.pem -f -w 0:0 <TH2_Public_IP> true
     # ifconfig tun0 1.1.1.10 netmask 255.255.255.0 up

     tun10
Similarly setup the tunnel host on AWS 

On the server(TH2):
    # ifconfig tun0 1.1.1.20 netmask 255.255.255.0 up
  • Check if you tunnel connection is up by pinging the tunnel IPs (1.1.1.10 and 1.1.1.20 from each other).
  • Next setup routing on the tunnel hosts to connect the private lab to your VPC.
On the client(TH1):
# route add 172.31.0.0/20 1.1.1.20

On the server(TH2):
# route add 192.168.56.0/24 1.1.1.10

Here is the route table on the TH2 after update

route20.png

  • Then setup routes on the EC2 instances to redirect traffic to your tunnel host(TH2) on AWS using its Private IP (TH2_Private_IP) (can be automated using Systems manager).

On each EC2 instance on the VPC

# route add 192.168.56.0/24 <TH2_Private_IP>

Same changes updates needs to be done on the private LAN instances using the Private ip of the local tunnel host(TH1_Private_IP)

# route add 172.31.0.0/20 <TH1_Private_IP>

Although you can also update the route on the LANs gateway if you have access to if you don’t want to every individual host on your LAN.

Test application: WordPress on EC2 with DB in private LAN

To test my setup I ran a WordPress server on the AWS while my DB remained on my local LAN. 

You will have to make MySQL/MariaDB listen to the all IPs on the machine, open up firewall rules to allow MySQL access on port 3306 and setup a db user and database with proper privileges.

mysql.pngOnce done start an EC2 instance (I used ubuntu 18 image) on the VPC and make sure that you can connect to MySQL on the private LAN.

Also make sure that you add SecurityGroup rules to provide access to HTTP port for WordPress

wordpress4.png

Install WordPress on this host along with php and a web server (used Nginx). You should be able to use the private LAN IP of the MySQL server and complete the install.

wordpress5

and ons installed you should be able to use the WordPress app as usual.

wordpress12

Conclusion

This is definitely not a production type of setup. I set this up for my personal need to experiment. The tunnel depends on liveness of SSH connection, although we have increased the timeout the connection may drop if there is no communication beyond the timeout period.

Advertisements

Test Driving Inter Regional VPC peering in AWS

Connect AWS VPCs hosted in different regions.

AWS Virtual Private Cloud(VPC) provides a way to isolate a tenant’s cloud infrastructure. To a tenant a VPCs provide a view of his own virtual infrastructure in the cloud that is completely isolated, has its own compute, storage, network connectivity, security settings etc.

In the physical world, Amazon’s data centers are organized into different geographical location called Regions. Each Regions has multiple Availability Zones which are data centers with independent power and connectivity to protect against any natural disaster.
A VPC is associated with a single Region. A VPC may have multiple associated subnets (one per Availability Zone). Subnets in VPC do not span across Availability Zones.

What is VPC Peering ?

Although as an AWS tenant, VPCs provide you with a secure way to isolate and control access to your cloud resources; it also creates its own challenges. As your infrastructure grows, you are forced to create multiple VPCs within or across multiple Regions. Question is: How would you now connect your instances deployed in different VPCs?

One way to connect your resources across VPCs is over the Public IP. This meant the traffic has to traverse the internet to travel between your cloud resources hosted in different regions.

Peering provides a solution to this problem by allowing instances in different VPCs to connect over a peering link. Till recently VPC peering was only allowed within a single region. This was before AWS allowed creating VPC Peering across Regions. With Inter Regional VPC peering the traffic between instances in VPC in different regions never have to leave the Amazon network.

Note: One thing to keep in mind while peering VPCs is that the peered VPCs should not have overlapping subnets CIDRs. As VPC peering involves route table updates to add routes to remote subnets, any CIDR overlap between the local and remote subnet will not work.

Creating Inter Regional VPC peering

To test Inter Regional VPC peering, you will need to create two custom VPCs in different Regions. I have explored custom VPC and Internet access in my previous post. For this test I am peering custom VPCs created in N. Virginia (CIDR 10.0.1.0/24) and London (CIDR 10.0.2.0/24) Region.

The peering links can be created in the “VPC” dashboard under the “Networking & Content Delivery” heading. To create a peering link navigate to

Networking & Content Delivery > VPC > VPC Peering Connections and click on the Create peering Connection.

VPC10

To create a peering connection you must know the VPC id of the local and peer VPC in the remote Region. While the local VPC can be chosen for the drop down menu, you must find the remote VPC id from the remote Region’s VPC console. On completing the above step a VPC peering request is created with status as pending

VPC11

To activate the peering link, the VPC in the remote Region must accept the peering request.

VPC13

Click “Accept Request” and “Yes, Accept”

VPC14

Once accepted the status of the peering link changes to Active.

VPC16

The next step is to update the route table of your VPC and add a route to the subnets associated with your peered VPC using the VPC peering link interface which are named as pcx-xxxxx…

Note: The route table must be updated on both the local and remote VPC to provide a complete forward and reverse path for the traffic.

VPC17

And that’s all. Now you should be able to communicate between the peered VPCs. You can try running ping or SSH between the instances on the VPCs. You may have to adjust the Security group rules to make this work.

VPC18

No transitive routing with VPC peering and problem of manual route table update

One of the limitation of VPC peering is that it only allows communication across directly connected VPCs.[As of Oct 2018]

As an example if VPC-A peers with VPC-B and VPC-B peers with VPC-C, the traffic is only allow between VPC-A , VPC-B and VPC-B, VPC-C. The peering link is not transitive this means VPC-A cannot send or receive traffic from VPC-C. To enable this VPC-A must directly peer with VPC-C.

VPC22

This means multiple VPCs wanting to communicate with each other in the group must form a full mesh of peering links. Additionally the route table associated with each of the VPC needs to be manually updated with the subnet CIDRs of all its peers and correct peering link. This can be quite tricky to maintain with manual updates.

Although third party routing solutions can be used to enable transitive routing, but it will undermine the routing infrastructure provided by AWS.

Recent Updates

AWS now supports accessing load balancers over Inter Regional Peering Links

https://aws.amazon.com/about-aws/whats-new/2018/10/network-load-balancer-now-supports-inter-region-vpc-peering/

Custom VPC and Internet Access in AWS

Create your VPC, launch EC2 instances and get internet access with Public IP.

With a Virtual Private Cloud(VPC), tenants can create his own cloud based infrastructure in AWS. While AWS provides a default VPC for a new tenant, there are always use cases that need creation of custom VPC.

While exploring custom VPC, I found that getting the my EC2 instances on the custom VPC to connect to the internet was not a straight forward and involves a few steps.

The next sections describe the steps needed to make the EC2 instances get internet access with Public IP

1. Creating the Custom VPC and associated Subnet

To create a custom VPC navigate to Networking & Content Delivery > VPC > VPCs  and click on “Create VPC” and provide a name for your custom VPC and an associated CIDR.

VPC1

VPCs are tied to your logged in Region. If you have multiple Availability Zones in this region you may want to create one Subnet per Availability Zone but further sub-netting the CIRD associated with the VPC. In my case I have created only one Subnet for the custom VPC.

VPC2

2. Creating an Internet Gateway for the custom VPC

Next step is to create an internet gateway. The internet gateway connects the VPC to the internet. To create it navigate to Networking & Content Delivery > VPC > Internet Gateways and click on “Create internet gateway

VPC4Next associate this Internet Gateway to your Custom VPC using the Actions menu after selecting the newly created Internet Gateway. After this step the Gateway should have an associated VPC

VPC6

3. Update Routing Table for the VPC to allow traffic to and from Internet

Next we have to update the routing table to add a default route to send and receive traffic from internet.

To do this navigate to Networking & Content Delivery > VPC > Route Table and select the routing table associate with the custom VPC and click on the “Routes” tab then click on edit and add a default route. Next click “Save” to update the routing table.VPC7

4. Launching EC2 instances on the custom VPC and checking connectivity

Finally we are ready to launch our EC2 instance and make it internet accessible using Public IP.

To do this navigate to Compute > EC2 and click on “Launch Instance

Follow the usual steps to Launch the instance except in the “Configure Instance” stage select your custom VPC for the network(this will automatically load the associated Subnet) and enable Auto-assign Public IP.VPC21You will be asked to create a new Security Group for the VPC (allow SSH access for testing).

Once launched you should be able to SSH to the instance using its Public IP.

VPC9

 

 

Stateful vs Stateless firewalls: Which one to use when?

Firewalls provide traffic filtering and protects the trusted environment for the untrusted. A firewall can be stateful or stateless

A stateful firewall is capable of tracking connection states, it is better equipped to allow or deny traffic based on such knowledge.  A TCP connection for example goes through the handshake (SYN-SYN+ACK-SYN), to EASTABLISHED state, and finally is CLOSED. A stateful firewall can detect these states.  If a packet belongs to an already running flow it can be allowed, while a new connection form the untrusted host can be dropped.

Let’s take a scenario to understand this better

A client sitting behind firewall connects to a web server www.example.com. and receives a reply.

Let’s see what configuration of the stateful and stateless firewall are needed to make this communication work.

Stateful Firewall configuration:

# Generic rule to allow clients to connect to any
# webserver on the internet
Allow traffic going out to port  80
Allow traffic related to connections initiated by any internal client back to the same client
Deny any other traffic coming in to the client

Stateless Firewall configuration:

Allow traffic going out to port  80 on www.example.com
Allow traffic coming from host www.example.com and port 80
Deny any other traffic coming in to the client

From the above it is clear that the stateful firewall will allow incoming traffic only if it is related to connections the client has started. Also, note that this makes it possible to write generic rules for a stateful firewall.

 

Slide1

Stateless firewall on the other hand does not have any knowledge of what connections the client has initiated, instead it depends purely on the attributes of the packet like source, destination address etc. to make the allow or deny decision.

Why do we need stateless Firewalls?

Stateful firewall needs to track each of the connection that passes though the firewall. It needs to maintain the state of all the active connection. New connections are actively added and expired connection are purged from the connection state maintained by the firewall. This requires a lot of resources (memory, cpu) on the firewall and as such is a costly.

Another consideration is load balancing traffic on multiple firewalls. In case of stateful firewall the connection state must be synchronized across multiple firewalls to provide a consistent view of active connections.

A stateless firewall on the other hand deals with a single packet at a time. Thus, the resources needed by such a filtering process is much less.

When to use Stateless firewall?

A stateless firewall can be a faster and less resource intensive alternative in the following cases

  • Server side firewall: If you are running a purely server application with well-known ports on a machine. In this case firewall can be explicitly programmed to allow connection to and from the server port. As the server ports are well known to the firewall and the server expects new connection anyway, stateless firewalls can handle this use case.
  • Client side firewall: A client program which strictly connect to a small set of trusted hosts (internal) can be protected using stateless firewalls with specific rules.

A stateful firewall on the other hand can be used to protect client applications which connect to a large number of untrusted hosts (webservers on internet, peer-to-peer traffic).  The connection tracker on the stateful firewall will only allow incoming packets which are related to communications started by the internal clients. All new traffic trying to reach the client application will be dropped by the stateful firewall.

Home network traffic analysis with a Raspberry Pi 3 and Ntop

I had the Raspberry Pi laying around for some time without doing any major function and so was the NetGear switch [1]. So, I decided to do a weekend project to implement traffic analysis on my home network.

I have a PPPoE connection to my ISP that connects to my home router [2]. The router provides both wire and wifi connectivity. As with most people I have very few devices that connect to the router over an Ethernet cable, most devices are wifi capable. This makes traffic monitoring a bit of a problem on the LAN side.

To get around the problem I decided to put the traffic monitor on the WAN side of the router.

The following figure shows the connectivity.

Slide1

Tapping the WAN side with port mirroring

The NetGear GS105E switch provides the capability of port mirroring. I used this to mirror traffic arriving through the router and the ISP connection. The mirrored traffic is passed on to the Raspberry Pi. All traffic monitoring happens on the Pi.

 

Screenshot from 2018-02-11 01:26:51

Monitoring tools

Once the traffic is available on the mirrored port, I was able to run traffic monitors like wireshark, tshark and tcpdump on the mirror port to analyze all the traffic between the router and ISP. These tools give a live view of the packets going through my home network.

To monitor traffic over long time I used Ntop [3]. It can aggregate and produce nice traffic analysis summary. I used the Rasbian [4] image for the pi and Ntopng can be easily installed from their repository using apt.

Accessing the Monitoring result

As the Gigabit port of the Pi is used to receive mirrored traffic, the monitoring dashboard is accessed over the wlan0 interface. This will keep the monitored traffic separate from the monitoring traffic.

Refs:

[1] https://www.netgear.com/support/product/GS105Ev2.aspx

[2] https://www.amazon.in/3G-4G-LTE-Router-Multi-WAN/dp/B00N0W4FTM

[3] https://www.ntop.org/products/traffic-analysis/ntop/

[4] https://www.raspberrypi.org/downloads/raspbian/

 

Using Mininet to test OpenStack Firewall drivers

OpenStack FWaaS project will be supporting a Layer 2 firewall based on OVS flow rules. While working on the OVS driver, I felt the need to do some quick tests to check if the flow rules are programmed correctly on the OVS bridge.

Although we can run a complete devstack system within a VM, to run a firewall test would mean running at least 2 nova instances and configuring the ports with FWaaS and SG policies. And unless I include other means it will need a lot of manual configuration.

With multiple nova instances running within the devstack VM, it quickly becomes a resource hog on a laptop. Manual bring up of topology even the simplest one is very painful and error prone for the quick setup and tear-down I needed

This is where I decided to test the new OVS Firewall driver using Mininet. Mininet provides easy way to setup network topology using Network Namespaces as hosts connected to OVS based switches(although other switch modes are also available)

The test setup for the Firewall Driver

For my tests, I needed a simple 2 node topology connected to a OVS switch. The firewall policies are pushed using openflow rules. Mininet supports OVS switches either connected to SDN controllers or as standalone bridges called OVSBridge.

Slide1

For the tests, we need a standalone OVS bridge and the Firewall driver will be used to configuring the flow rules.

To start the simulation, all we need is to specify the topology and mininet can deploy the same with bridges and namespace. The topology can either be custom made using host, switch and link definition e.g. link1 or we can use one of the prepackaged typologies that comes along with mininet link2

Compared to the nova instance VMs this is very lite on resources.

Configuration and access to hosts

One of the good things about mininet topology is that all the components come up with default config. The Namespaces are connected to the bridge and configured with IP address, all interfaces are up and IP connectivity is available as soon as the topology is deployed. Mininet also provides APIs to access the configuration. This helps in easy customization of the configuration and integration of the topology with the rest of the automation.

In the default configuration, the OVS bridge is configured with a single flow rule to make it work like a normal switch. For my tests, I had to remove this rule as more detailed flow configuration will be pushed by the Firewall driver based on the security policy.

It is also possible to run custom commands within the hosts. This is important as we need to start some process to open TCP ports to check the firewall policy. I used netcat to start a tcp server on each of the host and check the connectivity over the firewall.

Demo

Here is a demo of the simulator in action. The SG and FWaaS policies are configured to allow ICMP and TCP port 8000 traffic. The policies are tested with firewall driver set to SG, FWaaS and both.

 

The following CLI can be used to view a running trace of the rule hit counts in OVS.

sudo watch 'ovs-ofctl dump-flows s1|grep -v n_packets=0|sed -r "s/\S+//2"|sed -r "s/\S+//5"| sed -r "s/\S+//1"'

The simulator code

The complete code for this automation is quite small and is available as a single file. It allows deploying the 2-node topology and configuring the firewall policy and once the configuration is done the user lands on the CLI prompt. This can be used to do the traffic tests.

The details of using the script can be found in the README

 

 

 

 

 

 

 

SecureNet: Simulating a Secure Network with Mininet

I have been working with OpenStack(devstack) for a while and I must say it is quite convenient to bring up a test setup using devstack. At times, I still feel it is an overkill to use devstack for a quick test to verify your understanding of the network/security rules/routing etc.

This is where Mininet shines. It is very lite on resources and extremely fast in getting your topology up and running. It cuts the setup to the absolute necessities and needless to say, it has proven invaluable to me while trying out various topologies and tests.

Lacking Security Device simulation

The default Mininet toolbox comes with a switch and hosts nodes. The switch is primarily a controller based SDN switch like OpenVSwitch or IVS. The primary focus of Mininet has been L2 networks with SDN controllers.

While for me the goal was to test routing and security much like what is available on OpenStack. I found an example topology in the Mininet code simulating a LinuxRouter. Basically, a Host node (Namespace) was configured to do l3 forwarding.

This give me the initial idea of implementing security devices with Mininet, after all the reference network services (routing and firewall) in OpenStack are based on Namespaces

A Simulated Perimeter Firewall

As IPTables are available within the namespace I decided to use them to implement a Perimeter Firewall that would inspect the traffic between the Networks. I used a separate table to have my firewall rules and redirected all traffic hitting the FORWARD table to my custom table PFW. The last rule in the FORWARD table was to drop all traffic.

So now when I started my topology with the perimeter Firewall the ping test confirmed that no traffic was flowing between the networks.

MN1

To allow traffic we need to specifically configure the Firewall to allow packets.

MN3

This took care of securing the Network boundary, but how about traffic flowing within the network. OpenStack supports this using micro segmentation with Security Groups.

Simulating Micro Segmentation with OpenStack like Security Groups

Well as I was using OpenVSwitch for my topology in standalone mode, I decided to configure the OVS packet filtering capabilities to implement a firewall within the Network itself. Here is some sample OVS rules to do L3 packet match and filter.

ovs-ofctl add-flow switch1 dl_type=0x0800,nw_src=30.0.0.100,nw_dst=30.0.0.101s, action=DROP

And produces the following setting (can be verified with ovs-ofctl dump-flows switch1)

cookie=0x0, duration=9.050s, table=0, n_packets=0, n_bytes=0, idle_age=9, ip,nw_src=30.0.0.100,nw_dst=30.0.0.101 actions=drop

Updated CLI to allow Firewall Rules

All this looks good but it’s a lot of work to manually configure all the rules. And I was thinking, how can this be automated?

I extended the switch node in Mininet to implement a Secure Switch and while at it I also implement the Perimeter Firewall as a specialized Host Node. My intention of implementing these special nodes was to add the capability of configuring the nodes with security settings. By this time, it was already looking like an interesting Weekend Project 🙂

So, got into it an implemented a set of CLI commands for Mininet to configure the firewall capabilities on the topology(using the security nodes that I introduced). I was calling it the Secure Network.

I wanted to keep things simple to start with and so kept the rule definitions very coarse, namely only L3 filtering only(I may look at L4 in future). So here is the CLI I came up with.

Securenet] secrule add allow [addr1] to [addr2]
Securenet] secrule add deny [addr1] to [addr3]

Here addr1, addr2 and addr3 are Micro Segments(or Security Groups)

Now, to define Firewall rules with Micro Segments (let’s call it Security Groups as this is what is was trying to simulate at the first place) I needed a way to define the Security Groups and associate Hosts to it.

To keep things simple I went for an automatic creation of Security Groups and when a Host association command is entered. This was done by extending the host commands. Here is an example of creating a Security Group and associating a Host with it.

Securenet] h30 bind sg1

Above command creates a Security Group called SG1 and associates host h30 to it.

To add another Host to the same SG issue the same command with a different Host name

Securenet] h31 bind sg1

This adds h31 to sg1. To view the members of the security group use the sghosts command

Securenet] sghosts sg1
0: h30
1: h31

Reacting to Security Group changes with events

By this time, I was already too excited to stop and was thinking of the interactions between the Firewall rules and Security Group definitions.

As the high-level Firewall rules are composed of Security Groups, the firewall must react to the changes to Security Group definition. This means the change in Security Group definition must produce some kind of event that is observable by the Firewall and then it must update its configuration accordingly to keep the firewall rule constrains satisfied.

To do this I extended the Topology and Mininet class so that any change in the SG definition can trigger a re-evaluation of the Firewall rules.

MN4

Again to keep things reasonably simple I went with a rip and replace of Firewall configuration (both IPTables and OVS) and did not bother about the traffic impact (we are in simulation after all). But this might be an interesting area to explore in future.

Simulating IPSets

One thing that I was missing while defining the Firewall rules was a way to target a set of external IP address. As the IP addresses associated to a Security Group definition was derived out of its member Hosts, so there was no way to define rules based on addresses that are not part of the topology.

So, another set of CLI commands were introduced to define groups based on IP addresses manually and without any topology based constrains.

Securenet] secrule ipg add ipg1 1.1.1.1
Securenet] secrule ipg add ipg2 2.1.1.1
Securenet] secrule ipg add ipg3 2.1.1.2

CLI to view a list of IPGs

Securenet] secrule ipg list
0: ipg2
1: ipg3
2: ipg1

And to view its content use the following

Securenet] secrule ipg show ipg1
0: ipgh-1.1.1.1/32

With IPGs it was possible to have rules with arbitrary addresses.

Conclusion

Here is a run of a simple set of commands on the secure net topology

MN5

The updated IPTables config in the Perimeter Firewall.

MN6

And the security rules in OVS

MN7

This was more of a fun long weekend project for me. Once I dived into the project I realized a number of challenges that are posed by such an undertaking like rule optimization, minimizing the config changes, targeting the right device to push a security rule, figuring out duplicate rules and conflicts etc. Also, as I thought through more use cases I realized that the firewall rule definition needs a language of its own.

But at the end of the weekend I feel mostly it was a lot of fun.

Control Plane for our virtual network

In the last two blogs, I have gone through the process of developing a VPN base virtual network. One thing that we ignored is the amount of configuration that we need to change to add or remove nodes or provision new edge routers.

While, some of these steps are part of the infrastructure provisioning, like connecting the routers over L3 links and the VPN tunnel setup. These can be considered as static one time activity, while others are very transient like adding a device to the network and adding another edge router.

To manage such a network manually is time consuming and error prone. In this blog, I will explore means to automate this activity.

Let’s talk about adding a device to our network

What does it take to add a new device to the network?

Slide4

In terms of provisioning

  • First we need to know the IP address associated to the device.
  • we need to add a host route to all the edge router (local and remote) to make the communication between this new device and the rest of the already present devices in the network.

The reverse needs to happen when a device is taken off the network. The host routes associated with the device needs to be removed.

How about adding an Edge Router?

Adding a new edge router is a little more involved. Let’s say we have to add a new edge router to an existing network and also add a new device to it (because that is mostly the only reason to add a new edge router).

In terms of provisioning

  • The new edge router must add host routes for all the devices existing in the network.
  • Then it must follow the same steps as described earlier to add the new device to the network.

Removing the edge router will mean removing all devices connected to the edge router. In most cases removing the edge router with devices connected may not be a valid case (but we can always think of system failure which can cause such a situation).

How to automate?

From the above discussion, we can see that the job of adding and removing devices and edge routers is made of two tasks.

  • Distributing information about the device joining and leaving the network. i.e. distributing host routes when the device is added to the network and retracting the route when the device leaves
  • Adding host routes to the edge routers. This step is more about pushing(programming) the route into the routing table of the edge router

This job description exactly fits the skills of a routing protocol 🙂

BGP as control plane

BGP well known for its ability to distribute routes and its scalability. Let’s explore how BGP can be used for distributing the routes for both the cases above. Fortunately, BGP is designed such a job and can already take care of cases like new edge router joining the network.

For our example, we will use GoBGP to distribute the host routes. All the edge routers run BGP and form peer with each other.

If you are interested in details, please look at my previous blogs exploring goBGP and how to publish routes.

Adding a device node must trigger a route publish on BGP.

Adding an edge router is automatically taken care by the new edge router forming BGP peering with rest of the edge routers and receiving all the existing host routes

Other Options: Using a Controller

The other option is to use a SDN controller which can listen for events like device joining /leaving or new edge routers joining the network and send commands to orchestrate the edge routers.

This is a more centralized and ground up approach, but it gives you the flexibility of designing your own event handlers.

We have already discussed about the events that needs to be handled in the previous section, I will not go in to the details of a controller design (not in this post :))

Programming the routes

The second task in automating the device join/leave was to programming the routes in the edge routers.  This can be done with agents running in the edge routers which monitors BGP or listens to the commands from the SDN controller and configures the routes.

Designing a L2 virtual network

In the previous blog, we saw how we can connect two devices in the same subnet over a L3 device. With some configuration to enable ARP Proxying and host routes on the L3 device we were able to simulate a L2 network over a L3 device.

The important thing to note from the previous blog is that we did not make any change on the device nodes. All changes were made to the router.

So, what can we achieve with this capability?

In this blog, I will be taking that experience to the next level by designing a virtual L2 network using routers.

But most importantly we will be using only common Linux utilities and kernel features to achieve this.

Connecting nodes across multiple routers

Let’s start with setting up a simple topology for this experiment. We have two routers each connecting 2 device nodes. The routers themselves are connected to each other using a L3 link.

Slide2

In a way, each router with nodes are replica of the setup from my first blogs. The end nodes are devices while the middle node are the routers.

Device Interface IP Address
Node_1 node_if0 20.0.0.1/24
Node_2 node_if0 20.0.0.2/24
Vrouter1 vr_if1

vr_if2

eth1

1.0.0.1/24

Node_3 node_if0 20.0.0.3/24
Node_4 node_if0 20.0.0.4/24
Vrouter2 vr_if1

vr_if2

eth1

1.0.0.2/24

Look at the commands from my previous blog to see how to bring up this topology.

Making it work

To make it work each router need to be configured with Proxy ARP (refer to previous blog). Each router must be configured with host routes for all the device nodes. This means we have host routes for devices directly connected to the router and also the ones that are connected via the second router.

Here is an example routing table listing with routes for directly connected devices.

vrouter1 > route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
1.0.0.0         0.0.0.0         255.255.255.0   U     0      0        0 eth1
20.0.0.1        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if1
20.0.0.2        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if2

Use the following commands to setup remote routes

vrouter1 > ip route add 20.0.0.3/32 nexthop via 1.0.0.2
vrouter1 > ip route add 20.0.0.4/32 nexthop via 1.0.0.2

Similar route configuration will be needed for vrouter2 to reach node_1(20.0.0.1) and node_2(20.0.0.2)

Caveat: I found that L2 learning did not work properly when I connected multiple routers. To rescue the situation, I added static ARP entries.

Manually creating the ARP entries for directly connected devices on the router. Use the following commands to setup the static ARP entries.

arp -s 20.0.0.${node_no} $node_mac

With the above taken care you should be able to ping nodes across the multiple routers.

node_1 > ping 20.0.0.3
PING 20.0.0.3 (20.0.0.3) 56(84) bytes of data.
64 bytes from 20.0.0.3: icmp_seq=1 ttl=62 time=437 ms
64 bytes from 20.0.0.3: icmp_seq=2 ttl=62 time=1.02 ms
64 bytes from 20.0.0.3: icmp_seq=3 ttl=62 time=0.947 ms
^C
--- 20.0.0.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.947/146.511/437.563/205.804 ms

L3 fabric as a transport

Now that we have a workable L2 network that can span across multiple routers, is there anything else required to make this solution work in a generic environment?

If you look closely, you will realize that although the routers in our experiment are connected to each other over a L3 link, the host routes in our end routers depend on the knowledge of exact path connecting them, e.g. if we introduce a third router in between them our solution will not work. As the remote host routes now needs to be adjusted to account for the extra hop that was introduced by the third router. Additionally, all the routers in the path connecting the end routers need to be aware of the host routes for the device nodes.

The VPN tunnels to rescue

The way to solve this situation is to setup tunnels between the routers connecting the device nodes. In this way, the L3 path is isolated from knowing the host routes and the Host routes are isolated from the impact of changes in the L3 path that connects the end routers.

We can setup GRE tunnels between the end routers, but we can also make the connections over SSH tunnels.

Use the following commands for setting up GRE

vrouter1 > ip tunnel add tun0 mode gre remote 1.0.0.2 local 1.0.0.1 ttl 255
vrouter1 > ifconfig tun0 2.0.0.1/24 up

Or

Use the SSH based tunnel as described at

http://man.openbsd.org/ssh#SSH-BASED_VIRTUAL_PRIVATE_NETWORKS

So now we have out L3 Virtual Private Network which isolates us from impact of the changes in the path. Make sure that you use the tunneled path for reaching the remote nodes. To do this adjust the host routs to use the tunnel paths

vrouter1 > ip route add 20.0.0.3/32 nexthop via 2.0.0.2
vrouter1 > ip route add 20.0.0.4/32 nexthop via 2.0.0.2

Putting it all together

Finally, this is how the complete topology looks.

Slide3

Here is a summary of the IP addresses and interfaces on each node.

 

Device Interface IP Address
Node_1 node_if0 20.0.0.1/24
Node_2 node_if0 20.0.0.2/24
Vrouter1 vr_if1

vr_if2

eth1

tun0

1.0.0.1/24

2.0.0.1/24

Node_3 node_if0 20.0.0.3/24
Node_4 node_if0 20.0.0.4/24
Vrouter2 vr_if1

vr_if2

eth1

tun0

1.0.0.2/24

2.0.0.2/24

The routing table with tunnel paths enabled looks like the following.

vrouter-routing-table

And the pings are still working 🙂

node_1 > ping 20.0.0.3
PING 20.0.0.3 (20.0.0.3) 56(84) bytes of data.
64 bytes from 20.0.0.3: icmp_seq=1 ttl=62 time=290 ms
64 bytes from 20.0.0.3: icmp_seq=2 ttl=62 time=1.01 ms
64 bytes from 20.0.0.3: icmp_seq=3 ttl=62 time=0.973 ms
64 bytes from 20.0.0.3: icmp_seq=4 ttl=62 time=1.17 ms
^C
--- 20.0.0.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.973/73.499/290.832/125.477 ms

 

Using a Router for Switching

We all know that Routers are Layer3 devices and switches are Layer2. So how can a Layer 3 device be used to connect two or more devices at Layer2 ?

In this blog, I will explore the mechanisms that make it possible to use a routing device as a switch.

The test setup

We start with a simple setup with three Linux Network Namespaces connected in series. The middle node acting as a Router and the end nodes acting as the devices that needs a layer 2 connectivity.

Use the following script for setting up the Namespaces and connections.

https://bitbucket.org/!api/2.0/snippets/xchandan/Bazz4/0a29fd79dc2ad2be4c816f8db788abfed3ac9282/files/router_topo_step1.sh

Configuration

The script adds the following IP addresses to the device nodes.

Device Interface IP Address
Node_1 node_if0 20.0.0.1/24
Node_2 node_if0 20.0.0.2/24
Vrouter1 vr_if1

vr_if2

None

None

In this setup, we have two Layer2 broadcast domain A and B connected using the router.

Slide1

It doesn’t work, testing what’s is going on

Now we have two devices in the same Subnet connected using a Router instead of a switch. Start a ping form one device node to the other and predictably this does not work. But running tcpdump can give you an insight on to the traffic flow.

node_1 > ping 20.0.0.2
PING 20.0.0.2 (20.0.0.2) 56(84) bytes of data.
From 20.0.0.1 icmp_seq=1 Destination Host Unreachable
From 20.0.0.1 icmp_seq=2 Destination Host Unreachable
From 20.0.0.1 icmp_seq=3 Destination Host Unreachable
^C
--- 20.0.0.2 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4024ms

And following is the tcpdump output

vrouter1 > sudo tcpdump -i vr_if1 -n
tcpdump: WARNING: vr_if1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vr_if1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
13:29:08.265464 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:09.263145 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:10.263058 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:11.280501 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28
13:29:12.278951 ARP, Request who-has 20.0.0.2 tell 20.0.0.1, length 28

The ARP requests are not resolving.

As the device nodes are part of the same subnet, when we try to ping one of the device node from another, the first device node sends ARP queries to find the Layer2 address for the destination IP. ARP uses Layer2 broadcast to for finding the MAC address of the destination. As our destination is not part of the same Layer2 domain the ARP query broadcast will never reach it and will never be answered.

The missing link that will make it work

We saw in the previous section the problem of unresolved ARP queries.

So how do we rescue this situation?

To make this setup work we will need to enable Proxy ARP on the Router. When Proxy ARP is enabled on the router interfaces, the interface starts responding to the ARP query requests from the device nodes with its own MAC (Layer2) address. In a way, it proxies for a device(with the destination IP) which is actually not present on the Layer2 network.

To enable proxy ARP on the router, use the following commands.

vrouter1 > echo 1 > /proc/sys/net/ipv4/conf/vr_if1/proxy_arp
vrouter1 > echo 1 > /proc/sys/net/ipv4/conf/vr_if2/proxy_arp

Taste of success

So now the ARP problem is resolved. The device node gets a reply of its ARP query with a MAC address which actually belongs to the interface of the router. The device now knows what MAC address associated to the destination IP.

But the pings are still failing.

This is because we have solved only half the problem.

What does the Router do with the packet that it received? Once the Layer2 frame arrives on the router, it removes the Layer2 header and looks at the IP address. It obviously does not hold the destination IP.

So, the router does what the router does with any other IP packet, looks up the routing table.

To make the ping work, we will have to add host routes to the destination IP address on the router.

The following commands will achieve our goal.

vrouter1 > ip route add 20.0.0.1/32 dev vr_if1
vrouter1 > ip route add 20.0.0.2/32 dev vr_if2

This is how the routing table on the router looks like

vrouter1 > route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
20.0.0.1        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if1
20.0.0.2        0.0.0.0         255.255.255.255 UH    0      0        0 vr_if2

One more thing to keep in mind. Don’t forget to enable IP forwarding on your router.

vrouter1 > echo 1 > /proc/sys/net/ipv4/ip_forward

Finally, we can test the end to end traffic flow 🙂

 

node_1 > ping 20.0.0.2
PING 20.0.0.2 (20.0.0.2) 56(84) bytes of data.
64 bytes from 20.0.0.2: icmp_seq=1 ttl=63 time=0.167 ms
64 bytes from 20.0.0.2: icmp_seq=2 ttl=63 time=0.061 ms
^C
--- 20.0.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.061/0.114/0.167/0.053 ms