Options for Fast squid setup??

Let me explain our network situation...

We are using Hughes Satellite Internet and we pay like $129.00 per month for it and only have 500MB per day of download bandwidth. If we go over, we get SLOW internet for at least a day. And sometimes the internet is slow even when we do not go over. Although we have a 2MB/sec signal, its been slower than that at times.

So, I want to set up a squid system and I'm hoping that it will help speed things up and make it not as easy to go over the limit.

If at all possible, I would like to be able to run 3.1. If that is not stable enough, then I would do 3.0. I was initially running into some problems and then I realized that I was using 2.7 and some older configuration guides which did not help. I was able to get it to work running it non transparent but I noticed no speed improvement from pulling files off the internet or pulling them off my server.

At any rate, the computer I have has a 40GB hard drive (IDE), 2GB of ram, Pentium 4 processor (I forget the speed... I think its 2Ghz or something like that), and two ethernet cards.

I would like it to be able to run as fast as it can without me having to invest too much money into it.

My thoughts are to make it a transparent server and use COSS with diskd. My problem is I can find hardly anything on setting up either of those on Squid 3.0.

Is there anything I can do at all to improve speeds without me having to spend too much $$?

Or maybe squid is not my answer? I don't know. Anything that will help us improve our speed and avoid Satellite Internet's FAP thing, would be helpful.

Oh yeah and in case anyone asks, the reason we have Hughes is because we cannot get any other internet where we are at. Even cell phones do not work.

Thanks,
~Shawn
 
Also, I'm planning on buying another IDE hard drive. I'd love it to be a super fast Hard Drive but I do not want to spend the money for that.

Thanks,
~Shawn
 
The most likely reason why you're not seeing a lot of speed improvement is that, even when an object gets served from the local cache, Squid still needs to talk to the original webserver to find out whether the cached object is not stale, especially when an object has been on disk for some time.

It uses a short HEAD command for that, but it still needs about 600 ms to get back to you. It will most certainly save bandwidth anyway.

I believe objects that are still in memory and objects that haven't been committed to disk too long ago will not need to wait for that transatlantic 'check'.

Looking at your hardware specs, going for COSS or diskd is not likely to improve speeds in any way (and diskd requires a custom kernel). Stick with the much simpler UFS or AUFS schemes. Better invest in RAM and give as much cache_mem as you can to Squid, so it can store a lot of objects there and serve it up at top speed.
 
Thank you for your reply. I appreciate the help.

I'm thinking I might just give up on FreeBSD. I've been trying for a long time now and so far I've not been very successful. I've also been reading a lot about running squid transparently and most thread on here tell people to do a search for "squid transparent" and hardly any (if any) threads have the answer. I just PM'd a guy that had posted on the FreeBSD forum here wanting help getting transparency working. He said he gave up on FreeBSD because of all the trouble he was having and he went to Slackware.

I'm thinking of just installing a Gentoo Server.

Thanks,
~Shawn

DutchDaemon said:
The most likely reason why you're not seeing a lot of speed improvement is that, even when an object gets served from the local cache, Squid still needs to talk to the original webserver to find out whether the cached object is not stale, especially when an object has been on disk for some time.

It uses a short HEAD command for that, but it still needs about 600 ms to get back to you. It will most certainly save bandwidth anyway.

I believe objects that are still in memory and objects that haven't been committed to disk too long ago will not need to wait for that transatlantic 'check'.

Looking at your hardware specs, going for COSS or diskd is not likely to improve speeds in any way (and diskd requires a custom kernel). Stick with the much simpler UFS or AUFS schemes. Better invest in RAM and give as much cache_mem as you can to Squid, so it can store a lot of objects there and serve it up at top speed.
 
Running squid in transparent mode has nothing to do with the underlying OS. The syntax is still the same, and it is well-documented in the Squid FAQ and in the default/documented squid.conf file installed with the port. The only difference is redirecting traffic to Squid using one of the three built-in firewalls. There's no shortage of examples around the web. I have no idea how this would be any easier on Slackware or Gentoo.
 
Then again, I have a lot of experience with gentoo and I really like it.

Its highly configurable, and easy to make it do what you want it to do. But at times I've run into problems with it when it comes to updates. Sometimes it can have problems.

As with FreeBSD, I know almost nothing about it but I would like to learn if I could. And there's lots of things you can do with gentoo to simplify things and speed things up, but very little as far as I know on freeBSD.

Here is what I want to do:

1. Set up squid proxy server and have it run successfully in transparent mode.
2. Tweak it to the best of my ability (possibly buying more RAM and another hard drive if needs too), to increase speeds and make our satellite internet experience a little more enjoyable.
3. Configure the server to automatically start up and automatically start squid when the computer's power button is pressed, or after a power outage.
4. Buy and install the VideoCache package (which costs $10) so that I can cache most videos and Mp3 files.
5. Be able to monitor web traffic and limit bandwidth to some computers if necessary to keep us from going over the 500MB/day limit.

So far, I've spent hours and hours studying and I've found that most of the articles are old and unhelpful. A lot of them are more geared for 2.4 or 2.7. And it seems that most of the forums are full of people who seem to not have the answers.

Nevertheless, if it is at all possible to do the above with FreeBSD... then I'd do it.
 
OK, I think I'll do some more reading and try to get FreeBSD working transparently.

Is there any way at all that I can improve speed on my squid setup? I already have 2GB of ram and I do not know if this motherboard will handle anymore. I'll do some reading on that and find out. If it can, I might buy some.

I guess what I'm asking is, if COSS and DiskD is out of the question, is there any other way of improving performance and speed?
 
1, 2, 3 and 5 are totally possible, and there are alternatives for almost anything in the base system and in ports. Never heard of 4. BTW: Squid is developed on FreeBSD.

FreeBSD is highly configurable, because admins get to decide what FreeBSD does, there's no vendor or distro seller deciding it for you. Downside (to me: upside) -> you'll need to learn a lot of stuff and invest a lot of time. If you want to be in control, that is. Because that's FreeBSD for you.
 
More RAM and more disks are the way to bigger speed. You can create separate Squid caches on separate disks and tell Squid to use all of them.

Note that diskd doesn't actually make things faster, it just takes the disk access work out of the main Squid process, which is only interesting if you have hundreds of people browsing their heads off. Don't overestimate your needs.. A few people browsing will never saturate the combined bandwidth of RAM and any reasonable disk. Buy some 15K platters if you really need them.

I have 8 people browsing quite a lot all day long, using a single RAID0 disk cache (not even a dedicated Squid partition on it), and no one has any speed problems whatsoever. The hit rate is always around 50% on objects, and 30-40% on bandwidth savings.

The bulk of your performance drop is due to transatlantic checks of cache objects. The more objects you cache in RAM, the less that will happen, and once a cache is filled up nicely and gets a nice hit rate, it will happen even less -- and I deployed quite a few Squid caches on satellite links in countries even less fortunate than yours when it comes to Internet access, with 2,000+ people on a 3 Mbit link, shared with VOIP and streaming audio/video (radio/tv), with FreeBSD handling the firewalling, load balancing, traffic shaping and squid caching in one, on an old written-off HP server from 2000. I know FreeBSD can handle this without any trouble.

Good luck with Gentoo or Slackware ;)
 
I see. Well, I'll have to give it another go then. Yeah, I do find that one thing puzzling... that squid was made on FreeBSD. It ought to work great.

I think though, a large part of my problem is probably because I've still got 2.7 and 3.0 and 3.1 installed. I've tried to uninstall them manually but I've not been able to. Do you have any idea how I can uninstall them manually? I did not install them in the ports. I would like to just uninstall them all, and erase the cache (which squid's FAQ page tells me how to do that), and then start fresh.
 
Well, I'm going to have to buy more hard drives then. I wish I was not so limited to IDE, which is kinda slow compared to newer hard drives. As far as the hard drives go, does size matter or is it just that I need a few of them? Like, if I bought a couple 20GB hard drives... that ought to be enough, right? And then my other problem is that I still have to figure out how I'm going to have one master and 2-4 slaves. I thought you could only have one master and one slave.

What is '15k platters'?

Thanks a lot!
~Shawn
 
Don't go overboard with disks, because you will hardly manage to fill them. If you deploy 100GB of cache, chances are that Squid will never reach that limit, because it will throw away stuff that's becoming too old (stale). A good cache size for, say, a household, is 20-40GB tops. Small office: maybe 50-75GB. Anything more will hardly ever be used.

It is important to either dedicate entire disks to Squid, or at least make a seperate partition (mountpoint) for /squid.

I don't know how you installed squid, so I don't know how to deinstall. Your best bet may to run [cmd=]find / -type d -name squid[/cmd] and simply remove every directory named 'squid' (usually /usr/local/squid and /usr/local/etc/squid when installed from ports). You can clear the Squid cache by simply wiping the /path/to/squid/cache directory, which will be recreated by a [cmd=]squid -z[/cmd] later.

I can vouch for the stability of any Squid version in ports, because I've used (and am using) about all of them.

Oh, and '15K platters' is '15,000 RPM disks'. Probably not on IDE though ;)
 
BTW: don't be intimidated by squid.conf either. My typical squid.conf is about 15 lines long, plus one single redirecting firewall rule.
 
OK, I've got a couple more questions, if you have the time.

First, all that is VERY impressive on a 2000 HP computer. Very impressive.

As far as our internet setup, I think I definitely have to get youtube videos cached as we tend to watch a lot of videos (esp. my bother). The other thing is, my dad does eBay sales and I do all the web development work for his website. And I'm thinking about doing more web development work for other people too. That said, I think I would really have to get a decent size cache, but not too big. I'm thinking 50GB should be plenty especially seeing that we are only allowed 500MB per day but do have 2:00-7:00AM free. Maybe I'll just get 40GB.

So, are there advantages to having a couple smaller disks or should I just have one large disk set aside for only the cache?

I've deleted all of the squid files. Thanks for your help on that. Now, f you think its much better that I install squid from the ports, do you know how I can install it from the ports and still be able to configure it (like for transparency and such)? I also would like to install 3.0. The default one that it installs is 2.7.

I'm guessing a 15,000RPM disk would be very pricy. Unfortunately. Maybe if I could get a faster disk that would still work on IDE, or run it via firewire, that might work.
 
More smaller disks is better. If you have, say, 4 * 10 GB disks, just create 4 squid caches and tell Squid to use all 4 of them at the same time. This will mean faster disk reads (simultaneous fetches). You can capture larger objects (like videos) by setting the maximum_object_size pretty high (like 16 MB).

Everything installed from ports can be configured. This is FreeBSD ... you decide, no one else will do it. Ports installations are nothing more than source-based installations, with FreeBSD-specifics added so a) everything will work and b) everything will end up in predictable places (i.e. under /usr/local/ so you don't have to search for it) and c) services can be switched on or off using predefined scripts in /usr/local/etc/rc.d/ and entries in /etc/rc.conf (like squid_enable="YES").

Simply make sure your ports tree is up to date (portsnap(8)), and install the Squid port/version you like (btw: 3.1 is fine). The config will live in /usr/local/etc/squid/squid.conf, and a documented example config is installed with it. Almost all defaults can be left alone.

See the Handbook for details about the ports tree, installing and maintaining ports, making partitions, etc. This is where the learning bit starts, I just point you in the right direction ..
 
It has been a while since my last squid setup, but from what I remember, there's nothing wrong with using coss and aufs simultaneously. I believe coss has a small performance gain over aufs for storing smaller objects.
 
It has been a long time since I have configured a squid setup: I manage one at work, but have not had to do any managing for years.

You will need to build it from ports, because you need to build in firewall support for transparency to work. The option is SQUID_IPFW if you want to stick with IPFW (I long ago switched to pf)

From there, I don't remember too much trouble. Get it working as a normal proxy before you try using it transparently.
 
belikeyeshua said:
We are using Hughes Satellite Internet and we pay like $129.00 per month for it and only have 500MB per day of download bandwidth. If we go over, we get SLOW internet for at least a day. And sometimes the internet is slow even when we do not go over. Although we have a 2MB/sec signal, its been slower than that at times.
It sounds like you're in Africa. Over here we also have those sky-high prices, retarded usage restrictions, and limited choice of access provider. Pretty much every ISP here run a transparent cache and there's usually no way to bypass it.

First of all, if you're only allowed 500 MB a day, you can't have very many users. Like, maybe 4? The server hardware you're proposing can easily serve hundreds of internet users so really your focus should be on optimizing your cache hit rate, not your hardware performance. You say it was slow the last time you tried, but I'm not convinced your hardware or FreeBSD were at fault. Did you check your CPU and load averages back then?

FreeBSD can easily do what you want, but so will linux. Go with whatever gets you to the point of focusing on your cache hit rates. The best I've ever managed from Squid is a 20% bandwidth reduction on web traffic, which didn't translate to much in total, but I guess I didn't spend much time optimizing the cache back then and squid's caching is hopefully better now. (been a few years since my last squid setup)

LFUDA cache engine used to be the best. Search around for optimal refresh_pattern rules to make popular and badly developed sites more cacheable too. You'll need to run calamaris against your cache logs to see what sites are popular amongst your users so that you can try optimise the cache better for those sites too.

DutchDaemon said:
The most likely reason why you're not seeing a lot of speed improvement is that, even when an object gets served from the local cache, Squid still needs to talk to the original webserver to find out whether the cached object is not stale, especially when an object has been on disk for some time.
Not necessarily. With a well tuned config that can be reduced.

DutchDaemon said:
It uses a short HEAD command for that, but it still needs about 600 ms to get back to you. It will most certainly save bandwidth anyway.
Actually squid uses a conditional GET request which is more efficient than a HEAD and GET. Again though, even conditional GETs can be minimised with a well tuned config.
 
yah, i was able to get squid working transparently inside of a jail, which has got to be harder than doing what you need. I'm not an expert either, so it's really not that hard. Also, you don't even have to set it up transparently if you don't need to. Don't give up, FreeBSD is great.

I ended up using 2.7 i think, mainly because i was using that before on my pfsense box...i don't know much about 3.x except that it was considered "experimental" on pfsense last time i checked. But both are in ports. The config file that comes with it in a monster but i think it's just because squid is one of those programs that can be configured in hundreds of different ways....my actual config was only like 10 lines or so.
 
No, I'm not in Africa. I'm actually just west of Albany, NY. And the service we have is one of the best for Satellite. Still not that great.

Anyhow, how can I teak the config file to make it faster and minimize conditional 'GETS'?

My other problem is that before we move to Iceland we got rid of all of our computer things. We had lots of small (about 6GB) hard drives but we tossed them. So now, I do not have any small hard drives and I'll have to buy them. The only place I can think of getting them from is eBay. Stores around here do not carry them that small. And on eBay, I personally think they are very expensive for something that most people throw away.


DutchDaemon said:
Don't go overboard with disks, because you will hardly manage to fill them. If you deploy 100GB of cache, chances are that Squid will never reach that limit, because it will throw away stuff that's becoming too old (stale). A good cache size for, say, a household, is 20-40GB tops. Small office: maybe 50-75GB. Anything more will hardly ever be used.

It is important to either dedicate entire disks to Squid, or at least make a seperate partition (mountpoint) for /squid.

I don't know how you installed squid, so I don't know how to deinstall. Your best bet may to run [cmd=]find / -type d -name squid[/cmd] and simply remove every directory named 'squid' (usually /usr/local/squid and /usr/local/etc/squid when installed from ports). You can clear the Squid cache by simply wiping the /path/to/squid/cache directory, which will be recreated by a [cmd=]squid -z[/cmd] later.

I can vouch for the stability of any Squid version in ports, because I've used (and am using) about all of them.

Oh, and '15K platters' is '15,000 RPM disks'. Probably not on IDE though ;)

aragon said:
It sounds like you're in Africa. Over here we also have those sky-high prices, retarded usage restrictions, and limited choice of access provider. Pretty much every ISP here run a transparent cache and there's usually no way to bypass it.

First of all, if you're only allowed 500 MB a day, you can't have very many users. Like, maybe 4? The server hardware you're proposing can easily serve hundreds of internet users so really your focus should be on optimizing your cache hit rate, not your hardware performance. You say it was slow the last time you tried, but I'm not convinced your hardware or FreeBSD were at fault. Did you check your CPU and load averages back then?

FreeBSD can easily do what you want, but so will linux. Go with whatever gets you to the point of focusing on your cache hit rates. The best I've ever managed from Squid is a 20% bandwidth reduction on web traffic, which didn't translate to much in total, but I guess I didn't spend much time optimizing the cache back then and squid's caching is hopefully better now. (been a few years since my last squid setup)

LFUDA cache engine used to be the best. Search around for optimal refresh_pattern rules to make popular and badly developed sites more cacheable too. You'll need to run calamaris against your cache logs to see what sites are popular amongst your users so that you can try optimise the cache better for those sites too.


Not necessarily. With a well tuned config that can be reduced.


Actually squid uses a conditional GET request which is more efficient than a HEAD and GET. Again though, even conditional GETs can be minimised with a well tuned config.
 
if your going to go the ebay route you mgiht look at old servers. sometimes you can pick up old servers with small, but FAST scsi drives for not very much. I got a hp proliant dl380 g3 with 6 10k scsi drives and 2 2.8 ghz processors for 20 dollars and shipping (luckily it was within 100 miles so i just went and picked it up!)
i ALSO got a FULL wrack server case for FREE.

Also hit up craigslist, you can sometimes find old computer stuff for free in your area. It's great
 
DutchDaemon said:
More smaller disks is better.
More disks period, are better. Smaller disks (those with lower platter densities) are slower.


belikeyeshua said:
So now, I do not have any small hard drives and I'll have to buy them. The only place I can think of getting them from is eBay. Stores around here do not carry them that small. And on eBay, I personally think they are very expensive for something that most people throw away.
Forget the small disk suggestion (no offense DD). It is very difficult to setup a system today without going overboard with disk space. Abundance is a good thing. You will gain nothing from killing yourself just to get hold of old, small hard drives.

500 GB drives are dirt cheap these days. Get a 500 GB drive. A single 500 GB drive will probably be faster than four ancient 10 GB drives working together. Not just because its has a much higher platter density, but you'll also only be using 10% of the disk so you can short stroke it to yield the best performance out of it.
 
aragon said:
500 GB drives are dirt cheap these days. Get a 500 GB drive. A single 500 GB drive will probably be faster than four ancient 10 GB drives working together. Not just because its has a much higher platter density, but you'll also only be using 10% of the disk so you can short stroke it to yield the best performance out of it.

I think it depends on what you get too. You can sometimes pick up used 10-15k rpm scsi disks which are faster. It might be worth checking ebay or craigslist. aside from the hp machine i mentioned in the last post, i also got a couple p3 based dell servers which had 10k rpm scsi drives for almost nothing. It might be better to buy new stuff most of the time but sometimes you can get some great used stuff, and for something like this, i'd at least look
 
Thanks aragon! That helped a lot. I was actually just about to ask a couple more questions and then you answered them before I posted them ;).

Anyhow, I completely forgot about SCSI. I will have to get a SCSI PSI card for this computer, but I think that would work. Right now I can't do any searching on eBay because their site seems to be down. Very odd.

I did do some research on this computer here. It has 2GB of RAM and that is the max.

So then, does anyone know how much RAM I could safely allow in the squid config file? I need to save enough for the FreeBSD OS and all. If I allow 1.5 or 1.75 GB, would that be good?
 
It would be nice if I could just use a Flash Card (like an SD card) or a fast USB drive as RAM but my guess is that even if that was possible, it would be much slower than actual RAM (which is 333Mhz in my case).
 
Back
Top