In this technical deep dive into iptables, the Linux network security configuration utility, we'll see why and how to build a sophisticated TCP router and load balancer suitable to handle IoT applications traffic.
The majority of Platform as a Service are limited to web applications hosting, reachable via the HTTP protocol. However, in memory, CPU and battery constrained environment, like the IoT world, folks don't use HTTP. Usually a custom, fast and lightweight TCP based protocol is preferred.
When you think about it, the application "BUILD" and "RUN" stages are very similar from those of a web application. Programming languages (NodeJS especially) and databases are usually shared as well. Therefore, the only limiting factor for a PaaS to host IoT apps is to have a TCP routing layer.
This TCP routing layer must be able to do the following operations:
For HTTP routing, Scalingo uses
OpenResty.
However, it cannot be used for TCP routing (or so we thought, see the conclusion). That's why we
chose another approach based on iptables
.
First, let's define our different networks. In this article, we will consider two distinct networks:
192.168.1.0/24
where the clients are10.0.0.0/24
where the servers are (they host the app containers)The public network has one client with the IP: 192.168.1.2
and the
private network has three servers with the IPs: 10.0.0.2
, 10.0.0.3
and 10.0.0.4
.
Last part of the setup is a front server which makes the link between both
networks with the IPs: 10.0.0.1
and 192.168.1.1
.
In the following sections, we will assume that every operation and command take place on the front server unless told otherwise.
Let's start by trying to redirect all traffic coming to the TCP port 27017 on
the 192.168.1.1
IP to the port 1234 of the 10.0.0.2
server in the private
network.
This is done via a process called Network Address Translation (or NAT). In this article we will focus on two different NAT methods: DNAT and SNAT.
The DNAT method changes the Destination header of the IP and TCP packet.
Here, the IP and the TCP headers should be rewritten. So the destination IP of our packet
should be rewritten to 10.0.0.2
and the destination port should be rewritten to 1234
.
The following transformation happens:
PACKET RECEIVED PACKET FORWARDED
|---------------------| |---------------------|
| IP PACKET | | IP PACKET |
| | | |
| SRC: 192.168.1.2 | | SRC: 192.168.1.2 |
| DST: 192.168.1.1 | | DST: 10.0.0.2 |
| |---------------| | | |---------------| |
| | TCP PACKET | | =(DNAT)=> | | TCP PACKET | |
| | DPORT: 27017 | | | | DPORT: 1234 | |
| | SPORT: 23456 | | | | SPORT: 23456 | |
| | ... DATA ... | | | | ... DATA ... | |
| |---------------| | | |---------------| |
|---------------------| |---------------------|
To do so, we will need to use the PREROUTING
Chain in the nat
table of iptables.
iptables \
-A PREROUTING # Append a rule to the PREROUTING chain
-t nat # The PREROUTING chain is in the nat table
-p tcp # Apply this rules only to tcp packets
-d 192.168.1.1 # and only if the destination IP is 192.168.1.1
--dport 27017 # and only if the destination port is 27017
-j DNAT # Use the DNAT target
--to-destination # Change the TCP and IP destination header
10.0.0.2:1234 # to 10.0.0.2:1234
That's all. Now if we try to connect to the iptables host on the port 27017 our traffic will be redirected to our server.
If we try that on the client:
user@client ~ $ echo "Hi from client" | nc 192.168.1.1 27017
This command hangs, and the server shows nothing.
By looking at the packets received by Server 1
, we can see that the
iptables rule worked and the traffic has been redirected to the correct
destination.
user@server-1 ~ $ tcpdump -i eth1
15:19:17.832609 IP 192.168.1.2.23456 > 10.0.0.2.1234: Flags [S],
seq 37761180, win 29200, options [mss 1460,sackOK,
TS val 21306607 ecr 0,nop,wscale 6], length 0
The reason why the command hanged is that the server does not know how to
respond to that client since the source IP is set to 192.168.1.2
which is not
on his network.
The solution is to also modify the source IP and source port headers on the front server. This is done using the SNAT method.
The following transformations will occur:
PACKET RECEIVED PACKET FORWARDED
|-------------------| |-------------------| |-------------------|
| IP PACKET | | IP PACKET | | IP PACKET |
| | | | | |
| SRC: 192.168.1.2 | | SRC: 192.168.1.2 | | SRC: 10.0.0.1 |
| DST: 192.168.1.1 | | DST: 10.0.0.2 | | DST: 10.0.0.2 |
| |---------------| | | |---------------| | | |---------------| |
| | TCP PACKET | |=(DNAT)=>| | TCP PACKET | |=(SNAT)=>| | TCP PACKET | |
| | DPORT: 27017 | | | | DPORT: 1234 | | | | DPORT: 1234 | |
| | SPORT: 23456 | | | | SPORT: 23456 | | | | SPORT: 38921 | |
| | ... DATA ... | | | | ... DATA ... | | | | ... DATA ... | |
| |---------------| | | |---------------| | | |---------------| |
|-------------------| |-------------------| |-------------------|
The SNAT takes place after all routing decision (including our DNAT rule)
has been made, so we need to add the SNAT rule in the POSTROUTING
chain in
the nat
table.
iptables \
-A POSTROUTING
-t nat
-p tcp
-d 10.0.0.2 # Apply this rule if the packet is going to the IP 10.0.0.2
--dport 1234 # and if the packet is going to port 1234
-j SNAT # Use the SNAT target
--to-source 10.0.0.1 # To change the SRC IP header to 10.0.0.1
Iptables keeps in memory a translation table and handles automatically the connections coming back from the server, redirecting them to the client.
By retrying our previous nc
command, we get:
user@client ~ $ echo "Hi from client" | nc 192.168.1.1 27017
Hi from server
By looking at the packets received by Server 1
, we can see that the source
and destination IP has been changed by our front server.
user@server-1 ~ $ tcpdump -i eth1
15:29:37.384773 IP 10.0.0.1.38921 > 10.0.0.2.1234:
Flags [S], seq 3215489734, win 29200, options [mss 1460,sackOK,
TS val 21461495 ecr 0,nop,wscale 6], length 0
Iptables is commonly used as a firewall. It's time to use its main feature by adding some rules to drop every forwarded packet not explicitely allowed.
Each iptables chain has a default policy. Any packet that do not match a
rule in this chain is using this one. With a DROP
default policy, any
connection that is not explicitly accepted will be dropped.
iptables -t filter -P FORWARD DROP
The SNAT and DNAT rules previously written only modify the packet headers. The
filtering is not impacted by those rules. With the default policy set to drops,
we now need to explicitly accept traffic coming from and going to Server 1
:
# Accept traffic to Server 1
iptables -t filter -A FORWARD -d 10.0.0.2 --dport 1234 -j ACCEPT
# Accept traffic from Server 1
iptables -t filter -A FORWARD -s 10.0.0.2 --sport 1234 -j ACCEPT
We are now able to forward traffic going to the TCP port 27017 of our front server to a server hosting a single node application.
The next step is now to distribute connections across multiple nodes hosting our application.
In order to load balance between multiple hosts, a solution is to change the DNAT rule so it won't always redirect the clients to a single node but distribute them across multiple nodes.
To distribute those connections between Server 1
, Server 2
and Server 3
,
we could be tempted to define those rules:
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-j DNAT --to-destination 10.0.0.2:1234
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-j DNAT --to-destination 10.0.0.3:1234
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-j DNAT --to-destination 10.0.0.4:1234
However iptables engine is deterministic and the first matching rule will
always be used. In this example, Server 1
will get all the connections.
To address this issue, iptables includes a module called statistic
that
skip or accept a rule based on some statistic conditions.
The statistic module support two different modes:
random
: the rule is skipped based on a probabilitynth
: the rule is skipped based on a round robin algorithmNote that the load balancing will only be done during the connection phase of the TCP protocol. Once the connection has been established, the connection will always be routed to the same server.
To really load balance traffic on 3 different servers, the previous three rules become:
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-m statistic --mode random --probability 0.33 \
-j DNAT --to-destination 10.0.0.2:1234
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-m statistic --mode random --probability 0.5 \
-j DNAT --to-destination 10.0.0.3:1234
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-j DNAT --to-destination 10.0.0.4:1234
Notice that 3 different probabilities are defined and not 0.33 everywhere. The reason is that the rules are executed sequentially.
With a probability of 0.33, the first rule will be executed 33% of the time and skipped 66% of the time.
With a probability of 0.5, the second rule will be executed 50% of the time and skipped 50% of the time. However, since this rule is placed after the first one, it will only be executed 66% of the time. Hence this rule will be applied to only \(50\%*66\%=33\%\) of requests.
Since only 33% of the traffic reaches the last rule, it must always be applied.
You can compute the probability to set on every rule based on the number of rule \(n\) and the rule index \(i\) (starting at 1) with \(p=\frac {1}{n-i+1}\)
The other way to do this is to use the nth
algorithm. This algorithm
implements a round robin
algorithm.
This algorithm takes two different parameters: every
(n
) and packet
(p
).
The rule will be evaluated every n
packet starting at the packet p
.
To load balance between three different hosts you will need to create those three rules:
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-m statistic --mode nth --every 3 --packet 0 \
-j DNAT --to-destination 10.0.0.2:1234
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-m statistic --mode nth --every 2 --packet 0 \
-j DNAT --to-destination 10.0.0.3:1234
iptables -A PREROUTING -t nat -p tcp -d 192.168.1.1 --dport 27017 \
-j DNAT --to-destination 10.0.0.4:1234
Since we have a DROP default policy on our FORWARD chain in the filter table, we need to allow the three remote servers. This can be done with 6 iptables rules:
iptables -t filter -A FORWARD -d 10.0.0.2 --dport 1234 -j ACCEPT
iptables -t filter -A FORWARD -d 10.0.0.3 --dport 1234 -j ACCEPT
iptables -t filter -A FORWARD -d 10.0.0.4 --dport 1234 -j ACCEPT
iptables -t filter -A FORWARD -s 10.0.0.2 --sport 1234 -j ACCEPT
iptables -t filter -A FORWARD -s 10.0.0.3 --sport 1234 -j ACCEPT
iptables -t filter -A FORWARD -s 10.0.0.4 --sport 1234 -j ACCEPT
Now if our client tries to contact our application, we get the following output from our client:
user@client ~ $ echo "Hi from client" | nc 192.168.1.1 27017
Hi from 10.0.0.2
user@client ~ $ echo "Hi from client" | nc 192.168.1.1 27017
Hi from 10.0.0.3
user@client ~ $ echo "Hi from client" | nc 192.168.1.1 27017
Hi from 10.0.0.4
user@client ~ $ echo "Hi from client" | nc 192.168.1.1 27017
Hi from 10.0.0.2
[...]
In this article we saw how to build a TCP load balancer based on iptables and the Linux kernel. We use this method to create a TCP Gateway which is currently used in production IoT applications. The same method is used to build database's Internet direct access.
In the light of the recent work of Cloudflare on their Spectrum product we may incorporate some of their ideas in our own TCP load balancer.
Stay tuned, we'll announce official support of TCP apps in the following weeks!
At Scalingo (with our partners) we use trackers on our website.
Some of those are mandatory for the use of our website and can't be refused.
Some others are used to measure our audience as well as to improve our relationship with you or to send you quality content and advertising.