Advice for selecting a proper FreeBSD platform for proxy/URL filtering

Hello all,

since 2002 I had a FreeBSD 3.x installation, in the role of firewall and proxy, with URL filtering. Not fancy hardware, just a Pentium 4 system with a 2Gb and a hdd to serve the needs of 130 (at the time again) users.

The WAN link was a 1Mbps ADSL line. In 2008 the system got a WAN upgrade to a 4Mbps SDSL line. That's where I observed the first "issues": the only way I could make the squid/squidguard combo utilize fully the bandwidth was with single large file downloads. Throwing a couple of LAN google map / earth users in would make the system crawl to its knees, while not even 1Mbps was used from the line.

Our bandwidth is now 100Mbps at the WAN side, whereas our users are around 150, but even though the speed has been increased by more than an order of mangitude, throughput from WAN is still poor (unless single large tranfers take place). FreeBSD was source-updated up to 8.1. The binary upgrade-only option made me stop system updates. Obviously, the system is rather old (even though FreeBSD's stability during this 10+ years was exemplary), so it is time for an upgrade.

A Core i5 4th-gen is available for use, along with 3x Intel high quality PCIex NICs and a WD 500Gb black series disk. 8Gb of memory will be available. These were given to me. Yes, I know things could be much better if these low-power cheap Intel-based server cpus were available to me, yet even getting my grabs on this hardware required some epic battles...

Role of the system will be exactly the same. My problem will be mainly to replicate the EOL FreeBSD 8.1 settings of my old box, to the new 10.3 (most likely) box I'm building. Role will be firewalling and squid/squidguard filtering (Shalla-blocklists with some custom ones). My questions are the following:

1) Should I go for 32- or 64-bit considering main use is to URL filter web access, while maintaining a good throughput on the WAN line?

2) Any hints on what I should go for in order to maximize this througput?

Being a newbie, I am not used to FreeBSD utilities/methods to debug such a situation, if it arises on the new platform (which, I am afraid, will). Any help / solid advice will be appreaciated.
 
1) Should I go for 32- or 64-bit considering main use is to URL filter web access, while maintaining a good throughput on the WAN line?
A 32 bit OS can only address 4GB. As you have more than that you're going to need a 64 bit OS.

2) Any hints on what I should go for in order to maximize this througput?
Backup the old config and start with a new fresh, default, config, nothing fancy, and see how things progress. Nine times out of ten reasons why it's performing badly is because of wrongly configured services. Usually because someone tried to "optimize" things without knowing what they're doing. Too much optimizations (or the wrong ones) can actually make things worse, not better.

Being a newbie, I am not used to FreeBSD utilities/methods to debug such a situation, if it arises on the new platform (which, I am afraid, will). Any help / solid advice will be appreaciated.
The number one source would be the Squid access and cache logs.
 
...
2) Any hints on what I should go for in order to maximize this througput?
...

Two considerations:

1) Compared to 10+ years ago, a considerable and still growing amount of web-traffic is now running over HTTPS. You may want to consider setting up Squid as HTTP and HTTPS proxy, so it can apply its filters on the TLS traffic as well. Unfortunately, the setup procedure is not very well documented, and setting it up needs a little bit more care, for example, a good CA certificate store needs to be maintained. And in many countries, users need to be legally informed that they are browsing the web via a HTTPS proxy. If this comes into question, then please ask, once your basic setup is up and running.

2) Nowadays, usually one heavy part if not the heaviest one of the web-traffic is generated by all the embedded advertisement. So a server sided DNS-blacklist based ad-blocker would take this huge chunk of traffic out from what Squid needs to handle, and this at no additional CPU load, since DNS queries need to be done in any case. The additional benefit would be, that your users would appreciate an almost ad- and tracking-free web-experience. I wrote an article on my BLog on how this works. It is written in German language, however, with the aid of an online translator, you should be able to understand the most part of it, in case of doubts please ask.

German: http://blog.obsigna.net/?p=509
English: https://translate.google.com/translate?sl=de&tl=en&js=y&hl=de&ie=UTF-8&u=http://blog.obsigna.net/?p=509

UPDATE 2016-12-15: see also: https://github.com/cyclaero/void-zones-tools
 
Last edited:
The number one source would be the Squid access and cache logs.
These logs helped me in the areas of manipulating my URL filters, but not at all on optimizing throughput unfortunately.

I see HTTP/2 coming into play and do not know if squid 3.x can handle it or not. I was seeing a large number of problems on my 2.7.x squid though, especially with https sites. But that, as they say, is another story...

And in many countries, users need to be legally informed that they a browsing the web via a HTTPS proxy. If this comes into question, then please ask, once your basic setup is up and running.
Due to the nature of my agency, I would prefer to stay away from HTTPS proxying, due to its interception nature. OTOH, HTTPS traffic is becoming a large chunk of the daily bandwidth pie. So I think that I'll have to rethink my strategy sooner or later.

So a server sided DNS-blacklist based ad-blocker would take this huge chunk of traffic out from what Squid needs to handle, and this at no additional CPU load, since DNS queries need to be done in any case.

Seems nice, it will definitely ease out squidguard. However, it is not sites like that that create problems for me. It's the legit sites that make many small HTTP requests. HTTP/2 will change things, but I still don't know if squid will cope.

TBH, what I really wanted was a URL filter. Caching was good for me in the ADSL era. Now I just need very fast decisions to block or allow traffic. Hope I can win this, this time.
 
You don't need to worry about HTTP/2 just yet. Nobody has started using it in any serious way. It's good to be prepared for the future but I think it's still a little too early for it.
 
Thanks for this correction, I was under the impression that deployment had already started and fast too.

If that is ok, I will not tag this thread as "solved", to allow for some other posters to join in and discuss.
 
You don't need to worry about HTTP/2 just yet. Nobody has started using it in any serious way.
All of my clients use HTTP/2 and have so for several months. Two of them are very large clients.

Every new client in the know asks if we would upgrade them to HTTP/2 and we do this automatically.
 
And how much use does it get compared to HTTP/1.x?

Every new client in the know asks if we would upgrade them to HTTP/2 and we do this automatically.
Sure, the customer wants it, you deliver. I get that. I'm just wondering if they actually use it or if it's just something "new" they want. Likewise, IPv6 has been around for at least a decade and most ISPs still can't provide native IPv6 and don't even have plans for it in the foreseeable future.
 
...
TBH, what I really wanted was a URL filter. Caching was good for me in the ADSL era. Now I just need very fast decisions to block or allow traffic. Hope I can win this, this time.

So then, simply forget Squid. Set up local_unbound on the router as the caching resolver for your internal clients, and add the URL's that you want to block one per line at the end of the unbound.conf(5) file /etc/unbound/unbound.conf in the following format:
Code:
# UNBOUND CONFIGURATION
...
...
# URL BLOCK LIST
local-zone: "facebook.com" static
local-zone: "youtube.com" static
local-zone: "playboy.com" static
local-zone: "other.forbidden.dom" static
...
In this case, HTTP/2 would be of no concern on the router side. Only make sure, that the web browsers of your users can handle it. In the firewall you need o add a rule that denies local clients to bypass your local DNS resolver. In my ipfw(8) configuration I have:
Code:
/sbin/ipfw -q add 1000 deny ip from not me to any 53 out xmit $WAN
 
So then, simply forget Squid.
Not so easy, since I need granular control per user group; facebook.com might be allowed for some, disallowed for others. Plus, on a site basis I some URLs might be allowed, whereas others are not.
 
I forgot to mention that on my existing topology with my FreeBSD 8.1 box, nothing goes out without passing through the proxy. No NAT or routing takes place.
 
I forgot to mention that on my existing topology with my FreeBSD 8.1 box, nothing goes out without passing through the proxy. No NAT or routing takes place.
That's an excellent way to contain any malware infections you may get. You can probably simply copy the Squid config from the old box to the new one. It may need some minor edits but most of it should still be valid.

Don't try to tweak the various TCP settings just yet. Let it run for a while. FreeBSD usually does an excellent job with the default values. Install some nice monitoring on it to keep an eye on it. Tweaking and tuning is an on-going chore.
 
That's an excellent way to contain any malware infections you may get. You can probably simply copy the Squid config from the old box to the new one. It may need some minor edits but most of it should still be valid.

Thanks, that's how I'm proceeding now, although it's a zillion things to configure and I can not have both systems (old/new) in parallel (could be done, but I'd have to start meddling with the old setup).

BTW, which squid version would you propose I should elect to install? 3.x?

<off-topic>The most tricky thing will be to have my 3 pages of pf rules ported nicely. The beauty of my box is that it worked so nicely over the years, that I don't recall the specifics on how to do things hahaha. It's like starting from scratch in some areas.<off-topic>
 
Well, I developed the tools, the port was done by somebody else, and I was pleasantly surprised myself that the tools made it into the ports.
 
Not so easy, since I need granular control per user group; facebook.com might be allowed for some, disallowed for others. Plus, on a site basis I some URLs might be allowed, whereas others are not.

That's the way I'm doing it. Squid + Proxy auth and unrestricted access is only granted to some whitelisted sites (e.g. some repositories) and some whitelisted sources.
Everything else requires proxy authentication.
URL and / or domain blocking is done too with Squid and a simple ACL.
You just need to be careful specifying the correct order that everything matches the way you want.
 
Back
Top