Stateful vs Stateless firewalls: Which one to use when?

Firewalls provide traffic filtering and protects the trusted environment for the untrusted. A firewall can be stateful or stateless

A stateful firewall is capable of tracking connection states, it is better equipped to allow or deny traffic based on such knowledge.  A TCP connection for example goes through the handshake (SYN-SYN+ACK-SYN), to EASTABLISHED state, and finally is CLOSED. A stateful firewall can detect these states.  If a packet belongs to an already running flow it can be allowed, while a new connection form the untrusted host can be dropped.

Let’s take a scenario to understand this better

A client sitting behind firewall connects to a web server and receives a reply.

Let’s see what configuration of the stateful and stateless firewall are needed to make this communication work.

Stateful Firewall configuration:

# Generic rule to allow clients to connect to any
# webserver on the internet
Allow traffic going out to port  80
Allow traffic related to connections initiated by any internal client back to the same client
Deny any other traffic coming in to the client

Stateless Firewall configuration:

Allow traffic going out to port  80 on
Allow traffic coming from host and port 80
Deny any other traffic coming in to the client

From the above it is clear that the stateful firewall will allow incoming traffic only if it is related to connections the client has started. Also, note that this makes it possible to write generic rules for a stateful firewall.



Stateless firewall on the other hand does not have any knowledge of what connections the client has initiated, instead it depends purely on the attributes of the packet like source, destination address etc. to make the allow or deny decision.

Why do we need stateless Firewalls?

Stateful firewall needs to track each of the connection that passes though the firewall. It needs to maintain the state of all the active connection. New connections are actively added and expired connection are purged from the connection state maintained by the firewall. This requires a lot of resources (memory, cpu) on the firewall and as such is a costly.

Another consideration is load balancing traffic on multiple firewalls. In case of stateful firewall the connection state must be synchronized across multiple firewalls to provide a consistent view of active connections.

A stateless firewall on the other hand deals with a single packet at a time. Thus, the resources needed by such a filtering process is much less.

When to use Stateless firewall?

A stateless firewall can be a faster and less resource intensive alternative in the following cases

  • Server side firewall: If you are running a purely server application with well-known ports on a machine. In this case firewall can be explicitly programmed to allow connection to and from the server port. As the server ports are well known to the firewall and the server expects new connection anyway, stateless firewalls can handle this use case.
  • Client side firewall: A client program which strictly connect to a small set of trusted hosts (internal) can be protected using stateless firewalls with specific rules.

A stateful firewall on the other hand can be used to protect client applications which connect to a large number of untrusted hosts (webservers on internet, peer-to-peer traffic).  The connection tracker on the stateful firewall will only allow incoming packets which are related to communications started by the internal clients. All new traffic trying to reach the client application will be dropped by the stateful firewall.


Home network traffic analysis with a Raspberry Pi 3 and Ntop

I had the Raspberry Pi laying around for some time without doing any major function and so was the NetGear switch [1]. So, I decided to do a weekend project to implement traffic analysis on my home network.

I have a PPPoE connection to my ISP that connects to my home router [2]. The router provides both wire and wifi connectivity. As with most people I have very few devices that connect to the router over an Ethernet cable, most devices are wifi capable. This makes traffic monitoring a bit of a problem on the LAN side.

To get around the problem I decided to put the traffic monitor on the WAN side of the router.

The following figure shows the connectivity.


Tapping the WAN side with port mirroring

The NetGear GS105E switch provides the capability of port mirroring. I used this to mirror traffic arriving through the router and the ISP connection. The mirrored traffic is passed on to the Raspberry Pi. All traffic monitoring happens on the Pi.


Screenshot from 2018-02-11 01:26:51

Monitoring tools

Once the traffic is available on the mirrored port, I was able to run traffic monitors like wireshark, tshark and tcpdump on the mirror port to analyze all the traffic between the router and ISP. These tools give a live view of the packets going through my home network.

To monitor traffic over long time I used Ntop [3]. It can aggregate and produce nice traffic analysis summary. I used the Rasbian [4] image for the pi and Ntopng can be easily installed from their repository using apt.

Accessing the Monitoring result

As the Gigabit port of the Pi is used to receive mirrored traffic, the monitoring dashboard is accessed over the wlan0 interface. This will keep the monitored traffic separate from the monitoring traffic.







Using Mininet to test OpenStack Firewall drivers

OpenStack FWaaS project will be supporting a Layer 2 firewall based on OVS flow rules. While working on the OVS driver, I felt the need to do some quick tests to check if the flow rules are programmed correctly on the OVS bridge.

Although we can run a complete devstack system within a VM, to run a firewall test would mean running at least 2 nova instances and configuring the ports with FWaaS and SG policies. And unless I include other means it will need a lot of manual configuration.

With multiple nova instances running within the devstack VM, it quickly becomes a resource hog on a laptop. Manual bring up of topology even the simplest one is very painful and error prone for the quick setup and tear-down I needed

This is where I decided to test the new OVS Firewall driver using Mininet. Mininet provides easy way to setup network topology using Network Namespaces as hosts connected to OVS based switches(although other switch modes are also available)

The test setup for the Firewall Driver

For my tests, I needed a simple 2 node topology connected to a OVS switch. The firewall policies are pushed using openflow rules. Mininet supports OVS switches either connected to SDN controllers or as standalone bridges called OVSBridge.


For the tests, we need a standalone OVS bridge and the Firewall driver will be used to configuring the flow rules.

To start the simulation, all we need is to specify the topology and mininet can deploy the same with bridges and namespace. The topology can either be custom made using host, switch and link definition e.g. link1 or we can use one of the prepackaged typologies that comes along with mininet link2

Compared to the nova instance VMs this is very lite on resources.

Configuration and access to hosts

One of the good things about mininet topology is that all the components come up with default config. The Namespaces are connected to the bridge and configured with IP address, all interfaces are up and IP connectivity is available as soon as the topology is deployed. Mininet also provides APIs to access the configuration. This helps in easy customization of the configuration and integration of the topology with the rest of the automation.

In the default configuration, the OVS bridge is configured with a single flow rule to make it work like a normal switch. For my tests, I had to remove this rule as more detailed flow configuration will be pushed by the Firewall driver based on the security policy.

It is also possible to run custom commands within the hosts. This is important as we need to start some process to open TCP ports to check the firewall policy. I used netcat to start a tcp server on each of the host and check the connectivity over the firewall.


Here is a demo of the simulator in action. The SG and FWaaS policies are configured to allow ICMP and TCP port 8000 traffic. The policies are tested with firewall driver set to SG, FWaaS and both.


The following CLI can be used to view a running trace of the rule hit counts in OVS.

sudo watch 'ovs-ofctl dump-flows s1|grep -v n_packets=0|sed -r "s/\S+//2"|sed -r "s/\S+//5"| sed -r "s/\S+//1"'

The simulator code

The complete code for this automation is quite small and is available as a single file. It allows deploying the 2-node topology and configuring the firewall policy and once the configuration is done the user lands on the CLI prompt. This can be used to do the traffic tests.

The details of using the script can be found in the README








SecureNet: Simulating a Secure Network with Mininet

I have been working with OpenStack(devstack) for a while and I must say it is quite convenient to bring up a test setup using devstack. At times, I still feel it is an overkill to use devstack for a quick test to verify your understanding of the network/security rules/routing etc.

This is where Mininet shines. It is very lite on resources and extremely fast in getting your topology up and running. It cuts the setup to the absolute necessities and needless to say, it has proven invaluable to me while trying out various topologies and tests.

Lacking Security Device simulation

The default Mininet toolbox comes with a switch and hosts nodes. The switch is primarily a controller based SDN switch like OpenVSwitch or IVS. The primary focus of Mininet has been L2 networks with SDN controllers.

While for me the goal was to test routing and security much like what is available on OpenStack. I found an example topology in the Mininet code simulating a LinuxRouter. Basically, a Host node (Namespace) was configured to do l3 forwarding.

This give me the initial idea of implementing security devices with Mininet, after all the reference network services (routing and firewall) in OpenStack are based on Namespaces

A Simulated Perimeter Firewall

As IPTables are available within the namespace I decided to use them to implement a Perimeter Firewall that would inspect the traffic between the Networks. I used a separate table to have my firewall rules and redirected all traffic hitting the FORWARD table to my custom table PFW. The last rule in the FORWARD table was to drop all traffic.

So now when I started my topology with the perimeter Firewall the ping test confirmed that no traffic was flowing between the networks.


To allow traffic we need to specifically configure the Firewall to allow packets.


This took care of securing the Network boundary, but how about traffic flowing within the network. OpenStack supports this using micro segmentation with Security Groups.

Simulating Micro Segmentation with OpenStack like Security Groups

Well as I was using OpenVSwitch for my topology in standalone mode, I decided to configure the OVS packet filtering capabilities to implement a firewall within the Network itself. Here is some sample OVS rules to do L3 packet match and filter.

ovs-ofctl add-flow switch1 dl_type=0x0800,nw_src=,nw_dst=, action=DROP

And produces the following setting (can be verified with ovs-ofctl dump-flows switch1)

cookie=0x0, duration=9.050s, table=0, n_packets=0, n_bytes=0, idle_age=9, ip,nw_src=,nw_dst= actions=drop

Updated CLI to allow Firewall Rules

All this looks good but it’s a lot of work to manually configure all the rules. And I was thinking, how can this be automated?

I extended the switch node in Mininet to implement a Secure Switch and while at it I also implement the Perimeter Firewall as a specialized Host Node. My intention of implementing these special nodes was to add the capability of configuring the nodes with security settings. By this time, it was already looking like an interesting Weekend Project 🙂

So, got into it an implemented a set of CLI commands for Mininet to configure the firewall capabilities on the topology(using the security nodes that I introduced). I was calling it the Secure Network.

I wanted to keep things simple to start with and so kept the rule definitions very coarse, namely only L3 filtering only(I may look at L4 in future). So here is the CLI I came up with.

Securenet] secrule add allow [addr1] to [addr2]
Securenet] secrule add deny [addr1] to [addr3]

Here addr1, addr2 and addr3 are Micro Segments(or Security Groups)

Now, to define Firewall rules with Micro Segments (let’s call it Security Groups as this is what is was trying to simulate at the first place) I needed a way to define the Security Groups and associate Hosts to it.

To keep things simple I went for an automatic creation of Security Groups and when a Host association command is entered. This was done by extending the host commands. Here is an example of creating a Security Group and associating a Host with it.

Securenet] h30 bind sg1

Above command creates a Security Group called SG1 and associates host h30 to it.

To add another Host to the same SG issue the same command with a different Host name

Securenet] h31 bind sg1

This adds h31 to sg1. To view the members of the security group use the sghosts command

Securenet] sghosts sg1
0: h30
1: h31

Reacting to Security Group changes with events

By this time, I was already too excited to stop and was thinking of the interactions between the Firewall rules and Security Group definitions.

As the high-level Firewall rules are composed of Security Groups, the firewall must react to the changes to Security Group definition. This means the change in Security Group definition must produce some kind of event that is observable by the Firewall and then it must update its configuration accordingly to keep the firewall rule constrains satisfied.

To do this I extended the Topology and Mininet class so that any change in the SG definition can trigger a re-evaluation of the Firewall rules.


Again to keep things reasonably simple I went with a rip and replace of Firewall configuration (both IPTables and OVS) and did not bother about the traffic impact (we are in simulation after all). But this might be an interesting area to explore in future.

Simulating IPSets

One thing that I was missing while defining the Firewall rules was a way to target a set of external IP address. As the IP addresses associated to a Security Group definition was derived out of its member Hosts, so there was no way to define rules based on addresses that are not part of the topology.

So, another set of CLI commands were introduced to define groups based on IP addresses manually and without any topology based constrains.

Securenet] secrule ipg add ipg1
Securenet] secrule ipg add ipg2
Securenet] secrule ipg add ipg3

CLI to view a list of IPGs

Securenet] secrule ipg list
0: ipg2
1: ipg3
2: ipg1

And to view its content use the following

Securenet] secrule ipg show ipg1
0: ipgh-

With IPGs it was possible to have rules with arbitrary addresses.


Here is a run of a simple set of commands on the secure net topology


The updated IPTables config in the Perimeter Firewall.


And the security rules in OVS


This was more of a fun long weekend project for me. Once I dived into the project I realized a number of challenges that are posed by such an undertaking like rule optimization, minimizing the config changes, targeting the right device to push a security rule, figuring out duplicate rules and conflicts etc. Also, as I thought through more use cases I realized that the firewall rule definition needs a language of its own.

But at the end of the weekend I feel mostly it was a lot of fun.

Control Plane for our virtual network

In the last two blogs, I have gone through the process of developing a VPN base virtual network. One thing that we ignored is the amount of configuration that we need to change to add or remove nodes or provision new edge routers.

While, some of these steps are part of the infrastructure provisioning, like connecting the routers over L3 links and the VPN tunnel setup. These can be considered as static one time activity, while others are very transient like adding a device to the network and adding another edge router.

To manage such a network manually is time consuming and error prone. In this blog, I will explore means to automate this activity.

Let’s talk about adding a device to our network

What does it take to add a new device to the network?


In terms of provisioning

  • First we need to know the IP address associated to the device.
  • we need to add a host route to all the edge router (local and remote) to make the communication between this new device and the rest of the already present devices in the network.

The reverse needs to happen when a device is taken off the network. The host routes associated with the device needs to be removed.

How about adding an Edge Router?

Adding a new edge router is a little more involved. Let’s say we have to add a new edge router to an existing network and also add a new device to it (because that is mostly the only reason to add a new edge router).

In terms of provisioning

  • The new edge router must add host routes for all the devices existing in the network.
  • Then it must follow the same steps as described earlier to add the new device to the network.

Removing the edge router will mean removing all devices connected to the edge router. In most cases removing the edge router with devices connected may not be a valid case (but we can always think of system failure which can cause such a situation).

How to automate?

From the above discussion, we can see that the job of adding and removing devices and edge routers is made of two tasks.

  • Distributing information about the device joining and leaving the network. i.e. distributing host routes when the device is added to the network and retracting the route when the device leaves
  • Adding host routes to the edge routers. This step is more about pushing(programming) the route into the routing table of the edge router

This job description exactly fits the skills of a routing protocol 🙂

BGP as control plane

BGP well known for its ability to distribute routes and its scalability. Let’s explore how BGP can be used for distributing the routes for both the cases above. Fortunately, BGP is designed such a job and can already take care of cases like new edge router joining the network.

For our example, we will use GoBGP to distribute the host routes. All the edge routers run BGP and form peer with each other.

If you are interested in details, please look at my previous blogs exploring goBGP and how to publish routes.

Adding a device node must trigger a route publish on BGP.

Adding an edge router is automatically taken care by the new edge router forming BGP peering with rest of the edge routers and receiving all the existing host routes

Other Options: Using a Controller

The other option is to use a SDN controller which can listen for events like device joining /leaving or new edge routers joining the network and send commands to orchestrate the edge routers.

This is a more centralized and ground up approach, but it gives you the flexibility of designing your own event handlers.

We have already discussed about the events that needs to be handled in the previous section, I will not go in to the details of a controller design (not in this post :))

Programming the routes

The second task in automating the device join/leave was to programming the routes in the edge routers.  This can be done with agents running in the edge routers which monitors BGP or listens to the commands from the SDN controller and configures the routes.

Designing a L2 virtual network

In the previous blog, we saw how we can connect two devices in the same subnet over a L3 device. With some configuration to enable ARP Proxying and host routes on the L3 device we were able to simulate a L2 network over a L3 device.

The important thing to note from the previous blog is that we did not make any change on the device nodes. All changes were made to the router.

So, what can we achieve with this capability?

In this blog, I will be taking that experience to the next level by designing a virtual L2 network using routers.

But most importantly we will be using only common Linux utilities and kernel features to achieve this.

Connecting nodes across multiple routers

Let’s start with setting up a simple topology for this experiment. We have two routers each connecting 2 device nodes. The routers themselves are connected to each other using a L3 link.


In a way, each router with nodes are replica of the setup from my first blogs. The end nodes are devices while the middle node are the routers.

Device Interface IP Address
Node_1 node_if0
Node_2 node_if0
Vrouter1 vr_if1



Node_3 node_if0
Node_4 node_if0
Vrouter2 vr_if1



Look at the commands from my previous blog to see how to bring up this topology.

Making it work

To make it work each router need to be configured with Proxy ARP (refer to previous blog). Each router must be configured with host routes for all the device nodes. This means we have host routes for devices directly connected to the router and also the ones that are connected via the second router.

Here is an example routing table listing with routes for directly connected devices.

vrouter1 > route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface   U     0      0        0 eth1 UH    0      0        0 vr_if1 UH    0      0        0 vr_if2

Use the following commands to setup remote routes

vrouter1 > ip route add nexthop via
vrouter1 > ip route add nexthop via

Similar route configuration will be needed for vrouter2 to reach node_1( and node_2(

Caveat: I found that L2 learning did not work properly when I connected multiple routers. To rescue the situation, I added static ARP entries.

Manually creating the ARP entries for directly connected devices on the router. Use the following commands to setup the static ARP entries.

arp -s 20.0.0.${node_no} $node_mac

With the above taken care you should be able to ping nodes across the multiple routers.

node_1 > ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=62 time=437 ms
64 bytes from icmp_seq=2 ttl=62 time=1.02 ms
64 bytes from icmp_seq=3 ttl=62 time=0.947 ms
--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.947/146.511/437.563/205.804 ms

L3 fabric as a transport

Now that we have a workable L2 network that can span across multiple routers, is there anything else required to make this solution work in a generic environment?

If you look closely, you will realize that although the routers in our experiment are connected to each other over a L3 link, the host routes in our end routers depend on the knowledge of exact path connecting them, e.g. if we introduce a third router in between them our solution will not work. As the remote host routes now needs to be adjusted to account for the extra hop that was introduced by the third router. Additionally, all the routers in the path connecting the end routers need to be aware of the host routes for the device nodes.

The VPN tunnels to rescue

The way to solve this situation is to setup tunnels between the routers connecting the device nodes. In this way, the L3 path is isolated from knowing the host routes and the Host routes are isolated from the impact of changes in the L3 path that connects the end routers.

We can setup GRE tunnels between the end routers, but we can also make the connections over SSH tunnels.

Use the following commands for setting up GRE

vrouter1 > ip tunnel add tun0 mode gre remote local ttl 255
vrouter1 > ifconfig tun0 up


Use the SSH based tunnel as described at

So now we have out L3 Virtual Private Network which isolates us from impact of the changes in the path. Make sure that you use the tunneled path for reaching the remote nodes. To do this adjust the host routs to use the tunnel paths

vrouter1 > ip route add nexthop via
vrouter1 > ip route add nexthop via

Putting it all together

Finally, this is how the complete topology looks.


Here is a summary of the IP addresses and interfaces on each node.


Device Interface IP Address
Node_1 node_if0
Node_2 node_if0
Vrouter1 vr_if1




Node_3 node_if0
Node_4 node_if0
Vrouter2 vr_if1




The routing table with tunnel paths enabled looks like the following.


And the pings are still working 🙂

node_1 > ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=62 time=290 ms
64 bytes from icmp_seq=2 ttl=62 time=1.01 ms
64 bytes from icmp_seq=3 ttl=62 time=0.973 ms
64 bytes from icmp_seq=4 ttl=62 time=1.17 ms
--- ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.973/73.499/290.832/125.477 ms


Using a Router for Switching

We all know that Routers are Layer3 devices and switches are Layer2. So how can a Layer 3 device be used to connect two or more devices at Layer2 ?

In this blog, I will explore the mechanisms that make it possible to use a routing device as a switch.

The test setup

We start with a simple setup with three Linux Network Namespaces connected in series. The middle node acting as a Router and the end nodes acting as the devices that needs a layer 2 connectivity.

Use the following script for setting up the Namespaces and connections.!api/2.0/snippets/xchandan/Bazz4/0a29fd79dc2ad2be4c816f8db788abfed3ac9282/files/


The script adds the following IP addresses to the device nodes.

Device Interface IP Address
Node_1 node_if0
Node_2 node_if0
Vrouter1 vr_if1




In this setup, we have two Layer2 broadcast domain A and B connected using the router.


It doesn’t work, testing what’s is going on

Now we have two devices in the same Subnet connected using a Router instead of a switch. Start a ping form one device node to the other and predictably this does not work. But running tcpdump can give you an insight on to the traffic flow.

node_1 > ping
PING ( 56(84) bytes of data.
From icmp_seq=1 Destination Host Unreachable
From icmp_seq=2 Destination Host Unreachable
From icmp_seq=3 Destination Host Unreachable
--- ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4024ms

And following is the tcpdump output

vrouter1 > sudo tcpdump -i vr_if1 -n
tcpdump: WARNING: vr_if1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vr_if1, link-type EN10MB (Ethernet), capture size 65535 bytes
13:29:08.265464 ARP, Request who-has tell, length 28
13:29:09.263145 ARP, Request who-has tell, length 28
13:29:10.263058 ARP, Request who-has tell, length 28
13:29:11.280501 ARP, Request who-has tell, length 28
13:29:12.278951 ARP, Request who-has tell, length 28

The ARP requests are not resolving.

As the device nodes are part of the same subnet, when we try to ping one of the device node from another, the first device node sends ARP queries to find the Layer2 address for the destination IP. ARP uses Layer2 broadcast to for finding the MAC address of the destination. As our destination is not part of the same Layer2 domain the ARP query broadcast will never reach it and will never be answered.

The missing link that will make it work

We saw in the previous section the problem of unresolved ARP queries.

So how do we rescue this situation?

To make this setup work we will need to enable Proxy ARP on the Router. When Proxy ARP is enabled on the router interfaces, the interface starts responding to the ARP query requests from the device nodes with its own MAC (Layer2) address. In a way, it proxies for a device(with the destination IP) which is actually not present on the Layer2 network.

To enable proxy ARP on the router, use the following commands.

vrouter1 > echo 1 > /proc/sys/net/ipv4/conf/vr_if1/proxy_arp
vrouter1 > echo 1 > /proc/sys/net/ipv4/conf/vr_if2/proxy_arp

Taste of success

So now the ARP problem is resolved. The device node gets a reply of its ARP query with a MAC address which actually belongs to the interface of the router. The device now knows what MAC address associated to the destination IP.

But the pings are still failing.

This is because we have solved only half the problem.

What does the Router do with the packet that it received? Once the Layer2 frame arrives on the router, it removes the Layer2 header and looks at the IP address. It obviously does not hold the destination IP.

So, the router does what the router does with any other IP packet, looks up the routing table.

To make the ping work, we will have to add host routes to the destination IP address on the router.

The following commands will achieve our goal.

vrouter1 > ip route add dev vr_if1
vrouter1 > ip route add dev vr_if2

This is how the routing table on the router looks like

vrouter1 > route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface UH    0      0        0 vr_if1 UH    0      0        0 vr_if2

One more thing to keep in mind. Don’t forget to enable IP forwarding on your router.

vrouter1 > echo 1 > /proc/sys/net/ipv4/ip_forward

Finally, we can test the end to end traffic flow 🙂


node_1 > ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=63 time=0.167 ms
64 bytes from icmp_seq=2 ttl=63 time=0.061 ms
--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.061/0.114/0.167/0.053 ms



Test driving App Firewall with IPTables

With more and more application moving to the cloud, web based applications have become ubiquitous. They are ideal for providing access to applications sitting on the cloud (over HTTP through a standard web browser). This has removed the need to install specialized application on the client system, the client just needs to install is a fairly modern browser.

While this is good for reducing load on the client, the job of the firewall has become much tougher.

Traditionally firewall rules look at the Layer 3 and Layer 4 attributes of a packet to identify a flow and associate it with applications generating the traffic. To a traditional firewall looking at L3/L4 headers all the traffic between the client and different web apps looks like http communication. Without proper classification of traffic flows the firewall is not be able to apply a security policy.

It has now become important to look at the application layer to identify the traffic associated with a web service or web application and enforce effective security and bandwidth allocation policy.

In this blog, I will look at features provided by IPTables that can be used to classify packets by application Layer header and how this can be used to implement security and other network policy.

Looking in to the application layer

IPTables are the de-facto choice for implementing firewall on Linux. It provides extensive packet matching, classification, filtering and many more facilities. Like any traditional firewall the core features of IPTables allow packet matching with Layer3 and Layer4 header attributes. These features as we discussed in the introduction may not be sufficient to differentiate between traffic from various web apps.

While researching for a solution to provide APP Firewall on Linux I came across an IPTables extensions called NFQUEUE.

The NFQUEUE extension provides a mechanism to pass a packet to a user-space program which can run some kind of test on the packet and tell IPTables what action(accept/drop/mark) to perform for the packet.


This gives a lot of flexibility for the IPTables user to hook up custom tests for the packets before it is allowed to pass through the firewall.

To understand how NFQUEUE can help classify and filter traffic based on application layer headers, let’s try to implement a web app filter providing URL based access control. In this test we will extract the request URL from the HTTP header and the filter will allow access based on this URL

A simple web app

For this experiment, we will use python bottle to deploy two application. Access will be allowed for the first app(APP1) while access to the second app(APP2) will be denied.

We will use the following code to deploy the sample Apps

from bottle import Bottle

app1 = Bottle()
app2 = Bottle()

def app1_route():
    return 'Access to APP1!\n'

def app2_route():
    return 'Access to APP2!\n'

if __name__ == '__main__':
    app1.merge(app2)'eventlet', host="", port=8081)

The web application will bind to port 8081 and local IP of

NOTE: we need a eventlet based bottle server else the application hangs after a deny from the app filter(connections are not closed and the next request is not processed)

To access the web apps use the curl commands


Configuring the IPTables NFQUEUE

The next step is to configure IPTables to forward the client traffic accessing the web apps to our user space web-app filter.

The NFQUEUE IPTables extension works by adding a new target to IPTables called NFQUEUE. This target allows IPTables to put the matching packet on a queue. These packets can then be read from this queue by a filter application in user space. The filter application can then perform custom tests and provide a verdict to allow or deny the packet.

The NFQUEUE extension provides 65535 different queues. It also provides fail-safe options like what action IPTables should take if a queue is created but no filter is attached to it, load balancing of packets across multiple queues. Also, there are knobs in the /proc filesystem to control how much of the packet data will be copied to user space. A complete list of options can be found in the iptables extensions man page

To enable NFQUEUE for the web-app traffic we will add the following rule to IPTables.

iptables -I INPUT -d -p tcp --dport 8081 -j NFQUEUE --queue-num 10 --queue-bypass

The –queue-num option selects the NFQUEUE number to which the packet will be queued. The –queue-bypass option allows the packet to be accepted if no custom filter is attached to queue number 10, without this option if no filter is attached to the queue, packets will be dropped.

Implementing a simple APP filter

With the above IPTables rule the packets destined for our sample web app will be pushed into NFQUEUE number 10.  I am going to use the python bindings for NFQUEUE called nfqueue-bindings to develop the filter. Let’s run a simple print and drop filter.


# need root privileges

import struct
import sys
import time

from socket import AF_INET, AF_INET6, inet_ntoa

import nfqueue
from dpkt import ip

def cb(i, payload):
    print "python callback called !"
    return 1

def main():
    q = nfqueue.queue()
    print "setting callback"
    print "open"
    q.fast_open(10, AF_INET)
    print "trying to run"
    except KeyboardInterrupt, e:
    	print "interrupted"
    print "unbind"
    print "close"

if __name__ == '__main__':

Now we have tested that the packets trying to access our web app are passing through a app filter implemented in user space. Next we need to unpack the packet and look at the HTTP header to extract the URL that the user is trying to access. For unpacking the headers we will use python dpkt library. The following code will let us access to APP1 and deny access to APP2


# need root privileges

import struct
import sys
import time

from socket import AF_INET, AF_INET6, inet_ntoa

import nfqueue
import dpkt
from dpkt import ip

count = 0

def cb(i, payload):
    global count
    count += 1
    data = payload.get_data()

    pkt = ip.IP(data)
    if pkt.p == ip.IP_PROTO_TCP:
        # print "  len %d proto %s src: %s:%s    dst %s:%s " % (
        #        payload.get_length(),
        #        pkt.p, inet_ntoa(pkt.src),,
        #        inet_ntoa(pkt.dst), pkt.tcp.dport)
        tcp_pkt =
        app_pkt =
            request = dpkt.http.Request(app_pkt)
            if "APP1" in request.uri:
                print "Allowing APP1"
            elif "APP2":
                print "Denying APP2"
                print "Denying by default"
        except (dpkt.dpkt.NeedData, dpkt.dpkt.UnpackError):
        print "  len %d proto %s src: %s    dst %s " % (
               payload.get_length(), pkt.p, inet_ntoa(pkt.src), 

    return 1

def main():
    q = nfqueue.queue()

    print "setting callback"

    print "open"
    q.fast_open(10, AF_INET)


    print "trying to run"
    except KeyboardInterrupt, e:
        print "interrupted"

    print "%d packets handled" % count

    print "unbind"
    print "close"
if __name__ == '__main__':

Here are the result of the test on the client


The output from the filter on the firewall


What else can be done with App based traffic classification

Firewall is just one use-case of the advance packet classification. With the flows identified and associated to different applications we can apply different routing and forwarding policy. NFQUEUE based filter can be used to set different firewall marks on the classified packets. The firewall marks can then be used to implement policy based routing in Linux.

IPTables: Matching A GRE packet based on tunnel key

I was trying to figure out a way to match packets with a certain GRE key and take some action. IPTables does not provide a direct solution to this problem but has the u32 extension modules that can be used to extract 4 bytes of the IP header and match against a pattern.

So, I decided to give a try to this extension.

Prepare the setup

I created a tunnel between 2 of my VMs and assign IP address to the tunnel interfaces

On VM1

sudo ip tunnel add tun2 mode gre remote local ttl 255 key 22

sudo ifconfig tun2 up

On VM2

sudo ip tunnel add tun2 mode gre remote local ttl 255 key 22

sudo ifconfig tun2 up

Start with a basic rule

Next, created a IPTables rule on the receiving system to generate logs for packet match, but you can also create an ACCEPT rule and check the builtin packet counter for the rule.

sudo iptables -I INPUT -p 47 -m limit --limit 20/min -j LOG --log-prefix "IPT GRE" --log-level 4

Now start ping from VM2 to VM2


You can keep a watch on the packet counters with the following command

watch "sudo iptables -L -v -n"

The GRE header

Next, a look at the GRE header format (taken from RFC The header format is described in the RFC and it contains an optional 32bit key, which is the data of our interest.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|C| |K|S| Reserved0       | Ver |         Protocol Type         |
|      Checksum (optional)      |       Reserved1 (Optional)    |
|                         Key (optional)                        |
|                 Sequence Number (Optional)                    |

Run the following tcpdump command to capture the packets(My VMs don’t have GUI)

sudo tcpdump -s 0 -n -i ens3 proto GRE -w dump.pcap

The captured packets can be analyzed using wireshark


Understanding the iptables u32 extension match rule

Basically u32 module is able to extract 4 byte of data from the IP header at a given offset and match with the given hex number or range. Here is an example of a u32 match rule from the man-page. It matches packets within a certain length. The man page describes the format of the rule, you provide an offset , u32 extracts 4 byte from the offset position, and then we AND it with the MASK and finally compare with the HEX value


match IP packets with total length >= 256
The IP header contains a total length field in bytes 2-3.

--u32 "0 & 0xFFFF = 0x100:0xFFFF"

read bytes 0-3
AND that with 0xFFFF (giving bytes 2-3), 
and test whether that is in the range [0x100:0xFFFF]

The man page has more details.

Craft a match for GRE Key

The IP header length is 20 bytes and the GRE key starts at 24 bytes, as can be confirmed from the wireshark. At the beginning of the rule match starts at the IP header(highlighted in the wireshark screenshot)


Based on the example from the man page I crafted the following rule to match the GRE key.

sudo iptables -I INPUT -p 47 -m u32 --u32 "24 & 0xFFFFFFFF = 0x16" -m limit --limit 20/min -j LOG --log-prefix "IPT GRE key 22" --log-level 4

Checking for Key Present Flag

But the key can be optional. So, add match for Key-Present Flag.

sudo iptables -I INPUT -p 47 -m u32 --u32 "20 & 0x20000000 = 0x20000000 && 24 & 0xFFFFFFFF = 0x16" -m limit --limit 20/min -j LOG --log-prefix "IPT GRE key 22" --log-level 4

Here is a screen capture of the iptables packet counters

Chain INPUT (policy ACCEPT 294K packets, 78M bytes)
 pkts bytes target prot opt in out source destination
 711 79632 LOG 47 -- * * u32 "0x14&0x20000000=0x20000000&&0x18&0xffffffff=0x16" limit: avg 20/min burst 5 LOG flags
 0 level 4 prefix "IPT GRE key 22"

The above rule is simplistic and good to get you started but has short comings, e.g. it assumes a constant IP header length.

The man page describes examples of how to handle variable length headers, fragmentation check etc.

Running a standalone OpenStack Neutron server

One of the great advantage for an OpenStack developer is the ease with which a dev environment can be created. I cannot say enough good things about devstack. Devstack is a tool that provides a very flexible way of creating development environment for OpenStack.

Devstack is very flexible and can be configured using simple config file (local.conf). Another advantage of running devstack based environment is that it hardly needs any special hardware prerequisite. A VM on your laptop is good enough to bring-up an all-in-one OpenStack environment, although a good amount of RAM and CPU for your VM will yield better results.

As a developer interested in OpenStack networking (Neutron) my interest lies mostly on the Neutron service and most of the time I find a lot of OpenStack services are not really required for my day-to-day activity. So I decided to tweak the my devstack config file to start only the minimum services, just enough to run networking service and save a little on my devstack VMs RAM and CPU requirement.

The OpenStack networking service itself depends on the common set of infrastructure services like the database server, rabbitmq etc. The following is the local.conf that I used for this purpose.

#Q_PLUGIN_EXTRA_CONF_PATH += '/etc/neutron/fwaas_driver.ini'

enable_service rabbit
enable_service database
enable_service mysql
enable_service infra
enable_service keystone
enable_service q-svc
enable_service neutron

With the above local.conf only the network service and basic infrastructure are started by devstack. Here is a list of windows in my screen session

0$ shell  1$(L) key  2$(L) key-access  3$(L) q-svc   4$ code*  5-$ log

The Last two windows are manually created for browsing the code and looking at my custom logs.