FreeNAS to FreeBSD, backups

Hello,

I have used FreeNAS for a number of years, mainly as a remote media server. I know this is bad practice, but I have never really done backups of any kind...

Now, I need to setup a public web server that will hold user data. I have already created a FreeBSD jail (FAMP stack) on my current FreeNAS for testing purposes. It works fine. My plan is to upgrade the hardware and start fresh with FreeBSD.

What would be a good approach for data redundancy / backups, given that I currently have a single SSD for the OS and one HDD for storage? Is ZFS appropriate for both disks (different pools)?

The web server jail currently sits on the SSD, so I imagine backing it up on the HDD would be a start.

Any advice is appreciated.
 
I know this is bad practice, but I have never really done backups of any kind...
There are two kinds of people: those that religiously do backups, and those that have not lost data YET.

Now, I need to setup a public web server that will hold user data.
So you are holding data that other people have given to you, and you feel that you need to be trustworthy, or that your users have reasonably high expectations that the data is still going to be there. That makes perfect sense.

My plan is to upgrade the hardware and start fresh with FreeBSD.
An excellent choice of OS.

What would be a good approach for data redundancy / backups,
First, we need to talk a little bit philosophically about what you are trying to accomplish.

Are you interested in availability? For example, if there is a power outage at the server location, do you need to continue serving? In that case, I might recommend getting a UPS (so your server doesn't crash if the power is out for a fraction of a second, or 30 seconds), perhaps combined with an automatic-starting backup generator (we have a 500 gallon propane tank and a 17kW generator at home ... not for the server, but for the water pumps). But if you are going to that level, you also need to worry about network connections. How are you going to get reliable networks? I've heard of people getting both phone company DSL and cable modem, which leads to the (infamous) acronym "Redundant Array of Incompetent Public Utilities", or RAIP-U.

All joking aside: If I had to actually serve with any sort of expectation of availability, I would no longer attempt to do it myself. Instead I would go to one of the cloud providers (there are several large ones, and a myriad of small ones), and rent a small server. All the major providers have a plan for "free" machines, as long as you only need a small amount of CPU/memory/disk/network. Several of them offer FreeBSD. I personally have a tiny FreeBSD server that is a virtual machine "rented" (for free) from Google Cloud, look in this forum for my post about that, but there are also other providers.

Next step, storage availability. All disks fail, some faster than others, but always sooner than you want. If you have any expectation of availability, you *will use* RAID. It is not avoidable today. Why? Otherwise, if your only disk fails, you will be spending literally days restoring the data from backups, and not only are several of your days ruined, but also your server is down for a few days. With the size and reliability of disks today, doing 1-fault-tolerant RAID (like simple mirroring, a.k.a. RAID-1, or parity-based RAID with a single parity disk, like RAID-5 or RAID-Z is no longer sufficient; the probability of getting a double fault when one disk has already failed is approaching 1 as disks reach about 10TB (which they are today). At home, I still use simple mirroring, but I have extremely good backups, and only data that I can live with being gone for a day or two. For a server, I would today use triple mirroring (3 disks, with 3 copies of the data), and then use reasonably large disks; ~10TB should be more than enough for most small servers. If you need more space, go get about a half dozen to a dozen disks, and run RAID-Z2. At this point, you're looking at a pretty expensive server chassis.

Now in reality, I again think it doesn't make any sense to do this yourself. Go to Amazon AWS, Google Cloud, or Microsoft Azure, look at the price sheet, and rent yourself the CPU, network, and storage you need. It will be cheaper in the long run. And much less hassle. But less fun, if you're the kind of person who thinks that building your own computer is fun.

So far, we have NOT talked about backups at all. All the stuff we did so far was redundancy for availability and resilience of the data against hardware failures. Backups have a different purpose: Even with 12 disks running RAID-Z2 (or a really good setup rented in the cloud), a clueless admin, rogue employee or evil hacker can still do "rm -rf /", and the super-reliable and fast file system will permanently and thoroughly delete all your data. Or you could have a small fire at your house or office, and all three copies in your small server all burn. That's why it's important to have backups: not against disk failures, but against human failures, and off-site backups against destruction of a whole site.

Here would be my suggestion: (a) Use a commercial cloud service. (b) Otherwise, get a server chassis with 3 ... 12 data disks, and run ZFS, either with 3-way mirroring or RAID-Z2. Absolutely use ZFS rather than any other file system, since it gives you RAID built in, plus checksums for data consistency protection. (c) Find some backup software, and do backups, frequently. For example hourly to a separate disk drive, and daily or weekly off-site.

For backup software, I have no concrete suggestion. At home, I use something I wrote myself, which is full of bugs and idiosyncrasies, but happens to work for me: it makes an hourly backup to a small backup disk (which is right next to server, in a 1300 lbs fire-proof safe), and weekly backups to a disk which is carried off-site. At work, there are infinitely complex backup systems, which are hard to understand.[/QUOTE]
 
What would be a good approach for data redundancy / backups, given that I currently have a single SSD for the OS and one HDD for storage? Is ZFS appropriate for both disks (different pools)?
I wouldn't bother with ZFS unless you can set it up with multiple storage devices. So a mirror or a raid, this will ensure that your data will still be there if one of the disks fails on you. Although ZFS on a single drive will also still work (and it will definitely provide some advantages) I'd personally rely on UFS instead.

The backup scheme heavily depends on your own preferences, you can set that up any way you'd like.
 
Actually, I disagree. Even on a single disk, I would use ZFS. Why? First, the safeguard of checksums, which will detect silent data corruption by disks. And with the size of disks today, silent data corruption is indeed a real (but rare) thing. Second, the flexibility one gets. For example, if one has a file system on a disk drive, and then later decides that it needs to be moved to another disk, or needs to be enlarged, or needs to be turned into a set of redundant disks, with ZFS that can all be done online while the file system remains live and usable.

The price one pays is slightly lower performance, which one can also express as higher CPU and memory consumption for the same file system workload. And having to learn new management commands, but that's not terribly difficult.

If I had to create a backup system from scratch today, I would definitely use file system snapshots as an ingredient. ZFS has snapshots, but today UFS does too.
 
Just my opinion on based my observations during the years I've used FreeBSD but UFS is becoming a dead end pretty soon. FreeBSD's own UFS implementation is not used by any other operating system and it has numerous problems, resilience to power outages is one. It's very common that you have to run manual fsck(8) after a power outage or a hard crash. Pretty unacceptable at the times when journalling that works without questions is taken for granted. ZFS on the other hand has the huge benefit of being interoperable across large number of different operating systems and it's renown for its self healing features.

I also don't understand the objections for ZFS being resource hungry, it absolutely isn't and the only systems where ZFS isn't usable at this time and age are the very old i386 only systems with limited memory and I/O performance.
 
I can't help but wonder if some of you aren't blaming the tools for your own setups. I conclude as much because many people mention journaling and UFS as if this is a defined standard, while it isn't. There are more ways to set up journaling on UFS and depending on the way you do that it can be pretty robust.

My company maintains a server-environment used in South America where power provision isn't always very reliable. Our three year running support contract is about to expire and despite plenty of power failures we haven't had to manually use fsck during that time. We also didn't suffer from any data loss because of power failures. Worst we experienced was a HD failing.

These issues are hardly as clear cut as some of you are trying to portrait them.
 
Well, the recommended way is to use SU+J but last time I used that it came with its own problems, namely the inability to use UFS snapshots.

I haven't experienced a real data loss ever using FreeBSD, UFS or ZFS. With UFS the problem has always been the filesystem metadata getting mangled on hard crashes/power outages needing manual intervention on the next boot and that is what eventually turned me away from UFS.
 
Thanks for all the replies.

After some reading and experimenting with ZFS snapshots, I have decided on a backup scheme:

- 2 mirrored SSDs for zroot
- Scheduled snapshots of zroot and webserver jail
- ZFS send + gzip snapshot archives to HDD
- Upload webserver archive to cloud storage
- Upload webserver data and config (eg. /data, mysqldump, /etc) to cloud
- UPS (already present)

I think it strikes the right balance of cost and data loss prevention, in this case.

My upgraded hardware is ECC memory compatible, however it was near impossible to get a decent ECC DIMM in time and at a reasonable price.

At the moment, I am running ZFS on a single disk with no issues.
 
Back
Top