I’ve just had another of those “your firewall’s not working” calls. What a way to start the day. The scenario this time: large UK service provider running multiple hosted applications in several Internet Data Centres across the country.

Application developer testing out a new application, and it’s not working. Since the traffic has to pass through the DC firewalls to get from his remote PC to his DC server, the first step in fault finding, as usual, is just to raise a fault call with the firewall team.

It doesn’t start too badly, actually. The PC that he’s testing from does in fact have an IP address within the range of addresses that have been set up to be allowed through to that server address.

You’d be surprised (or maybe not) how often that isn’t the case. So the initial traffic is passing the firewall rules. And the firewall has a route to the server subnet, so it’s sending the traffic on. But he’s not getting any response back at his PC.

Okay, time for a bit more digging. It would be nice if someone could look on the server and tell me if the connection request is getting to it, but of course the guy who is developing the application on the server doesn’t have rights to do this, and nobody can find the server admin who does.

So: back to using the firewall as an analyser again. I can see data from the PC’s IP address to the server address he’s given me, but nothing coming back. Not from that server address anyway.

Hang on—I am seeing traffic back from a different IP address going to the PC address though, which the firewall is dropping since it’s never heard of this source address and there’s nothing in its rules to let it through. What’s going on?

Several phone calls later, and we find somebody who knows how this part of the network is put together. I won’t even start on why nobody lets the firewall team see the network diagrams so they can figure out how it’s supposed to work. That would be far too easy.

And the missing piece—the load balancers that have been installed. There isn’t actually one server here—there are three. The ‘server address’ that the application developer was using is the VIP address on the load balancer, which then passes the traffic through to the best server to handle the request. Thanks for telling us, guys!

In itself that’s not a problem and certainly not new, and a lot of the time they even get it to work first time round. But in this case it wasn’t set up quite right. Normally, traffic goes from PC to VIP address, from load balancer to real server, back to load balancer, and back from VIP address to PC.

The firewall sees the inbound and return traffic, matches it up, and lets it through. It’s clever like that—that’s how it was designed. In this case, though, the load balancer was passing the redirected traffic to the real server transparently, with a source address of the PC, not itself, so the server was replying back directly to the PC, taking the load balancer out of the return path.

Again, this can be done. It’s called direct server return, asymmetric server return, Npath, SwitchBack, depending on your vendor, and it can improve performance quite drastically by offloading traffic from the load balancer once it’s made its choice as to which server to send the request to.

But normally the VIP address that the users know about (that is actually a load balancer address) also appears on the real servers so that they can reply from the address that the user thought he was talking to - makes sense if the application is supposed to tie up request and response somehow.

And if the firewall is also supposed to tie up the two-way traffic flow. Just what exactly did you think was going to happen if traffic goes through the firewall in one direction going to one IP address, and comes back from a completely different one? Any wonder it blocked the traffic?

So, either change the load balancer config or set the servers up right—I don’t really mind, just stop blaming my firewalls.