ZFS ZFS send/recv: possible to change properties?

Hello, apologies if this has already been asked, I have extensively Googled and was unable to find any definite solutions which worked on FreeBSD.

I use ZFS to manage my data and find it's a handy way of backing it up to removable USB drives. However, the removable USB drives being of rather limited capacity are becoming quite full, and if possible I would rather not split the backup across multiple drives (increased chance of physical loss, more points of failure etc, at least depending on how you look at it).

Anyway, my problem is this. I used a homebrew solution to send/recv stuff (basically new snapshot, zfs send -R filestore/filesys@new | zfs recv -Fvu backup/filesys for full backups, zfs send -i filestore/filesys@old filestore/filesys@new | zfs recv -dvu backup/filesys for incrementals. Full backups are usually just for new backup drives with old ones just being pruned occasionally).

The trouble is, the filesystems containing the more valuable data has copies set to 3 which is great on a live system but probably not required on backups (though some may disagree). I would like the receiving end to change that value to 1 and also turn on compression on particular filesystems (though not all: no point trying to compress my music and photo storage, for instance) but I can't figure out how to do it. It seems that neither send nor recv has an option to change attributes, I can't do it in advance as it complains that the destination filesystem already exists, and I suspect that doing it afterwards would also mess up further incremental updates. Plus it would be very slow, my USB does 30 MBytes/sec on a good day so it takes over a day to run a full backup. I did have a much faster card (FBSD doesn't seem to support many at the affordable end) but it died after a few weeks. :/ But I digress.

Any thoughts? I'm currently using a Nov 2016 vintage 12.0-CURRENT and should really upgrade but have been avoiding the potential trauma; if newer versions do what I need I am prepared to bite the bullet, though.

tl;dr version: zfs send/receive: sending filesystem has the properties copies=3 and compression=off; I want the receiving filesystem to have copies=1 and compression=on. Is this possible on 12.0-CURRENT?
 
copies and compression only affect subsequent writes to a file system, so you can set them temporarily while you perform the initial replication. After that as long as you are receiving incremental sends the properties won't be changed.
 
Why even bother with zfs recv? I really wouldn't put a ZFS pool onto a single (USB) drive, UFS would be much better for that (for example: it is more robust and has better recovery options such as location table backups).

So, for example: # zfs send zroot/home | gzip | dd of=/opt/backup/home-$(date "+%d%m%y").zfs.gz.

That will most definitely save up quite a bit of space.

(edit)

And if you need to grab a single file from a backup just access .zfs/snapshot in the root of your ZFS filesystem.
 
Thanks for your replies.

My understanding is that zfs send/recv do their thing pretty much atomically, and any attempt to interrupt that to change attributes means that subsequent incrementals will no longer work due to a pre-existing filesystem or that there is an integrity problem due to something having changed. But "my understanding" is often the equivalent of "ye're doin' it wrong", which is why I'm asking! Preferably with a handy idiots' guide.

And the reason I'm not storing the raw ZFS send output is that the documentation has severe warnings that the format could change and be meaningless, which rather put me off storing it in its raw form. I did tinker with the idea of just using basic rsync on a UFS filesystem but that loses some of the ZFS benefits like snapshots and may well take much longer on my very slow USB-connected drives.

The whole situation is complicated further by storing them on a GELI-encrypted partition since part of my backup plan is to have them outside of a secure area in case said secure area burns down, and I'm just as clueless about GELI as I am about ZFS, not least about the possible interactions between them...
 
Oh dear, so many things going a bit weird here...

I just re-read your first post: why on earth use CURRENT? No offense intended here, but giving the kind of questions you ask (nothing wrong with those!) I don't understand why you'd want to use a developer snapshot?

CURRENT is not meant for production, there aren't even any guarantees that the thing will actually run, not to mention the risk of backdoors and other exploitable code (there's a ton of debugging code in there, which isn't optimized for security (nor efficiency) at all).

Upgrading is basically building and installing a new version. If you'd use a supported release you could simply enjoy the ease of using freebsd-update.

My understanding is that zfs send/recv do their thing pretty much atomically, and any attempt to interrupt that to change attributes means that subsequent incrementals will no longer work due to a pre-existing filesystem
Incorrect.

Think Unix... all you're doing with zfs send is create a stream of data. Nothing more, nothing less. Creating incremental backups basically means sending the differences between snapshots. See also zfs(8). So as soon as you create a snapshot you're basically creating a new (fixed) point in time and then you can send incremental data (so: differences between the most recent and previous snapshot).

But seriously, why bother?

ZFS is at its best if you create a so called redundant pool. Meaning that you use two or more storage units in your pool, which will effectively make sure that if one unit dies then your pool won't go down with it because you have an extra spare.

Code:
peter@zefiris:/home/peter $ zpool status zroot
  pool: zroot
state: ONLINE
  scan: scrub repaired 0 in 2h0m with 0 errors on Tue Jan 16 06:06:25 2018
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0

errors: No known data errors
This is when ZFS works as its best. Not necessarily a mirror-0 but having multiple storage units within the pool. It also helps ZFS to safeguard data integrity because it can checksum the files between the two storage units and notice if any differences occur (which could indicate datarot for example).

This also means that any snapshots which I make are decently safe. Of course it helps to keep external backups, but despite that you can also rely (to a certain degree) on a snapshot retention as an easy accessible backup:
Code:
peter@zefiris:/home/.zfs/snapshot $ ls
170818/ 180818/ 190818/ 200818/ 210818/ 220818/ 230818/
1 week retention, the mirror ensures that my data can't fail due to one drive breaking down. I sent one full stream once per week and then use an incremental send a few days later.

But there are more arguments to store your data in file format: it'll be more efficient when restoring.

See... in my situation the external backups are my 'last line of defense', I only rely on them when something really bad has happened which means that I need to gain access to my data ASAP. Not just the data, but the meta-data as well (ZFS filesystem properties for example). If you have everything stored as a file then it's pretty easy:

# ssh backup@host "cat /opt/backups/backup.zfs.gz" | gunzip | zfs recv.

This is slightly more efficient because your system wouldn't need to process data to generate a stream, then sent it over and then process the data again to generate the snapshot / filesystem. Instead you generate a stream using a file, send that and it gets processed.

Please note though that I'm not claiming that this is the best way of doing it. Quite frankly there really isn't something as a 'best' way because that heavily depends on your own setup. But I will say that this can be much more efficient than storing a full copy of the filesystem.

And the reason I'm not storing the raw ZFS send output is that the documentation has severe warnings that the format could change and be meaningless, which rather put me off storing it in its raw form.
That is only partially correct. Although it is true that the ZFS versions change over time and that formats also change you're forgetting a very important aspect: backwards compatibility:

Code:
breve:/home/peter $ zpool status zroot
  pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
Never versions of ZFS drivers will always be able to access older data formats. If that wasn't the case then what good would backups be?

It is for that same reason why I can maintain the situation as it is above. My ZFS pool hasn't been upgraded even though the server itself has (from FreeBSD 10.x to 11.2). So there is absolutely no reason to worry about this.

Heck... With my FreeBSD 11.2 environment I can even access ZFS backups which I made with Sun Solaris 10/x86. Talk about old ;)

Anyway, hope this can help to create some more clarity.
 
ZFS works fine with Geli. As ShelLuser points out, the ZFS stream format is very stable, provided the obvious (sender version <= receiver version) holds.
If you're performing incrementals then you'll want to be confident they haven't been damaged in transit and can be all stitched back together again. Receiving the data at the other end satisfies this concern. You can also zpool scrub the drive periodically to verify the contents. If you take ShelLuser's approach then I'd suggest testing the restore procedure against a spare machine. If it is somebody else's data that's good practice anyway.
I'm slightly curious about copies=3 though - isn't failure of a whole drive much more likely than 2/3 replicated blocks? And does lz4 compression add enough overhead to be practical concern?
 
My use of CURRENT is for historic reasons: actually very old historic reasons and dates back around 20 years when something I needed at the time (it says a lot that I can't even remember what it was...) was only tentatively supported in CURRENT and not at all in anything else. I guess it's probably time for me to move away from CURRENT because the randomness of "will today's snapshot actually compile let alone work?" is too much and I'm getting too old for that sort of thing, especially considering I'm not working on system development. :D

I'm aware of Unix and its love of byte streams but it's a double edged sword, which is that today's standard can be tomorrow's meaningless garbage. Not meaning to be at all obtuse there, just that the last time I read about it (which is admittedly a long time ago) the warnings about storing ZFS send data and expecting them to be readable at any point in the future were very explicit: ZFS pools are certainly backwards compatible, but at least at the time, ZFS send stuff was decreed as arbitrary. I dare say that may have changed since then but I remain a bit twitchy. But the main reason I want to have a pool on my backup drive rather than UFS and archives, rsync or whatever is for the purpose of speedy backups (having mentioned my rather glacial transfer rates) and also handy snapshots.

As for my server, which is just the thing that lives under my desk at home, that's not doing anything especially interesting or contrived other than using CURRENT: ZFS is using RAID-10 (or is it 0+1? I can never remember. A stripe-set of mirrors, anyway) with online spares but for obvious reasons (like the house burning down, our mains being struck by lightning, two drives failing at once etc) I like to keep backups that are not on the physical machine nor in its vicinity. And being on a limited budget, that means 2½" drives (realistically 2TB unless I use Seagate which makes me nervous) in weatherproof enclosures I can secrete in random places.
 
ZFS works fine with Geli. As ShelLuser points out, the ZFS stream format is very stable, provided the obvious (sender version <= receiver version) holds.
If you're performing incrementals then you'll want to be confident they haven't been damaged in transit and can be all stitched back together again. Receiving the data at the other end satisfies this concern. You can also zpool scrub the drive periodically to verify the contents. If you take ShelLuser's approach then I'd suggest testing the restore procedure against a spare machine. If it is somebody else's data that's good practice anyway.
I'm slightly curious about copies=3 though - isn't failure of a whole drive much more likely than 2/3 replicated blocks? And does lz4 compression add enough overhead to be practical concern?
It seems that things may have changed regarding the send/recv format since I read those dire warnings, then! Either that or I misunderstood them, which is not unlikely.

The multiple copies thing is due to quite a few experiences of drives with bad blocks that were unrecoverable: some of these were back in the bad old days of ESDI but I've also had SCSI drives that haven't completely failed but have experiences serious bad block problems. And on more occasions than I would write off as just being randomness. So I tend to be a bit cautious about that sort of thing, even when stuff is mirrored and backed up... Er, so to answer your question, my experience is that partial drive failures were more common than complete drive failures.

As for compression, nowadays I guess not using it is probably more likely to slow things down considering the discrepancy between CPU speeds and IO rates...
 
Create the zfs filesystem on the usb drive first. Set the properties you want on it.

Then do a normal send/recv, not a replication send (remove -R), and the properties set on the usb drive won't change.
 
Actually compression on ZFS datasets wins in almost all cases vs. uncompressed. It takes less read operations to fetch the compressed data from the disk than it would without compression. After the data is read the time taken by decompression totally insignificant compared to the time it took to read the data from the disk.
 
Create the zfs filesystem on the usb drive first. Set the properties you want on it.

Then do a normal send/recv, not a replication send (remove -R), and the properties set on the usb drive won't change.
So could I do this and then send incremental updates to it? That's really what I'm looking for: partly to reduce the time taken to backup, partly because having multiple revisions is quite useful.


Actually compression on ZFS datasets wins in almost all cases vs. uncompressed. It takes less read operations to fetch the compressed data from the disk than it would without compression. After the data is read the time taken by decompression totally insignificant compared to the time it took to read the data from the disk.
Yeah, I'm a bit embarrassed that I'm only just starting to think about this. Reading two blocks plus negligible CPU time to decompress is going to be way faster than reading three blocks, for instance. Though depending on how the compression works it could befuddle seeks to specific locations: I imagine that's a problem that has long since been solved, though...
 
From zfs(8):
The format of the stream is committed. You will be able to receive your streams on future versions of ZFS.

so being able to receive a stream in the future is one of the project's commitments. That said, receiving into a live filesystem is much more useful than saving the byte streams. (Since you can dive in and easily pull out a file.) This seems like a good point for a shameless plug of my zfs_versions script for looking at old (snapshot) versions of a file ;).

So could I do this and then send incremental updates to it? That's really what I'm looking for: partly to reduce the time taken to backup, partly because having multiple revisions is quite useful.

Yes, you can do incremental sends after your initial full replication, provided the 'base' snapshot of the incremental send is the latest state on the destination. A good way to make sure this is the case is to set the filesystems on the backup to readonly=on. (You can still receive streams into a readonly filesystem; the readonly part applies to the exposed POSIX layer.) Be very wary of using zfs recv -F [...] unless you have carefully read and grokked the man page... especially the parts about removing snapshots / filesystems on the receive side that don't exist on the source side. You shouldn't need -F unless something has really gone sideways (or you really want the snapshot/filesystem removal process to happen.)

Incremental updates are awesome. Compared to doing rsync crawls of large filesystems, it took my backup times from almost an hour to seconds (if there were no updates) or ~rate-limited / linear with the actual quantity of changes.

There are very few cases where I would ever consider turning off lz4 compression; for your backup drive (depending on the compressibility of the data) you may even consider gzip over lz4 to eke a little more space (at the cost of performance / CPU time)... note there are gzip-[0-9] options available, and you can adjust on the fly, too.
 
I should obviously re-read those bits of the manual page I'd just assumed had remained static, then! Thanks to everyone who's pointed out that zfs send is stable and doesn't arbitrarily change from one invocation to the next! But yeah, being able to access it as a live filesystem rather than basically as a tape archive makes life a lot easier. That zfs versions thing is also of interest to me as I'm frequently pulling stuff out of old snapshots and it's sometimes a bit of a pain doing it manually.

Some other interesting tips there, not least the readonly flag: there's no good reason the backup should be writeable, after all. It's so long I forget why I used the -F flag on the full backup version, I think it was just an expedient way to grab everything but I have to admit that finding a suitable backup procedure was a bit of a case of trial-and-error: I knew basically what I wanted but was more vague about how to accomplish it. I had used rysnc prior to that but discontinued it due to speed and other limitations.

One message that's coming through loud and clear is to use compression. So I guess that I should do that. I'll need to look into potential problems with lseeks but as I mentioned I'd be surprised if that hadn't been figured out long ago, and I'm sure everybody wouldn't be recommending it if e.g. indexed files crawled to a stop when using it (though I imagine the gigabytes of cache mitigate that just a bit).

And yeah, external backups are absolutely my final port of call if all else fails, and testing them is definitely good practice. I had a bit of a scare the other day when my server wouldn't boot after I'd replaced a drive that'd developed a fault: never did find the cause of what made the boot system so unhappy but I guess keeping the boot partition on UFS seems like a not unreasonable answer to whatever was making it unhappy (the kernel once loaded didn't care about whatever the boot system was unhappy about and never even mentioned it).
 
One message that's coming through loud and clear is to use compression. So I guess that I should do that. I'll need to look into potential problems with lseeks but as I mentioned I'd be surprised if that hadn't been figured out long ago, and I'm sure everybody wouldn't be recommending it if e.g. indexed files crawled to a stop when using it (though I imagine the gigabytes of cache mitigate that just a bit).

There's a zfs parameter to balance this as needed for your workload/dataset: recordsize.

First, some background: ZFS file payloads are all stored on disk in blocks accessed "indirectly" via (potentially multiple levels of) block pointers. With this architecture, "seeks" are achieved by stepping (directly, this isn't a "search") through the block-pointer tree to locate the physical address of the block on disk containing the desired offset. Once this address is located, the physical disk read is performed (and decompress/compress if needed) of the block containing the requested offset -- note the time required for the block-pointer walk is identical for compressed or uncompressed files, as the blocks are of fixed/known logical size, while the physical allocation for each block may be (due to compression) differently sized.

It is also worth nothing that the [L2]ARC operates on the compressed blocks these days, so using compression not only improves your on-disk performance (by reading fewer blocks) but also your [L2]ARC efficiency by making it "larger" for compressible data. See here for a deeper dive and discussion of compression benefits and here for reference.

So if you crank up the recordsize, one of the primary drawbacks is the latency added to seeks: now a larger record may need to be read (and potentially decompressed) for the final part of the seek to complete in the in-memory buffer. So it would be "bad" to use a large recordsize on, for example, a large database file -- and tuning down the recordsize is primarily used for just this case: reducing latency for databases (or "indexed files") that need to seek to and modify small records inside a large file. (Again, see zfs(8).) Using small recordsizes unnecessarily on files that don’t have that access pattern reduces the opportunity for compression, and increases the overhead of the block-pointer walk (and overhead for storing all the block pointers).

So like many things, there is a trade off, and ZFS lets you choose. As far as latency impact of compression during a seek, it will only be the time required to decompress one block, independent of the size of the file.
 
Why all this hassle though? Why not make snapshots based on priority and base the incremental part on the timing of those snapshots?

Seems like a lot less hassle to me.
 
There's a zfs parameter to balance this as needed for your workload/dataset: recordsize.

First, some background: ZFS file payloads are all stored on disk in blocks accessed "indirectly" via (potentially multiple levels of) block pointers. With this architecture, "seeks" are achieved by stepping (directly, this isn't a "search") through the block-pointer tree to locate the physical address of the block on disk containing the desired offset. Once this address is located, the physical disk read is performed (and decompress/compress if needed) of the block containing the requested offset -- note the time required for the block-pointer walk is identical for compressed or uncompressed files, as the blocks are of fixed/known logical size, while the physical allocation for each block may be (due to compression) differently sized.
To my shame I have absolutely no idea at all about how ZFS physically arranges its data; your description sounds broadly similar to how UFS and predecessors store it though I would hazard a guess it's probably more like a btree; not extent-based though, by the sound of it. I have some serious reading to do. And not before time considering how long I've been using it: considering I like to know how things work, I feel I've been more than a little amiss satisfying my curiosity and now here I am asking basic questions as a result!

It is also worth nothing that the [L2]ARC operates on the compressed blocks these days, so using compression not only improves your on-disk performance (by reading fewer blocks) but also your [L2]ARC efficiency by making it "larger" for compressible data. See here for a deeper dive and discussion of compression benefits and here for reference.

So if you crank up the recordsize, one of the primary drawbacks is the latency added to seeks: now a larger record may need to be read (and potentially decompressed) for the final part of the seek to complete in the in-memory buffer. So it would be "bad" to use a large recordsize on, for example, a large database file -- and tuning down the recordsize is primarily used for just this case: reducing latency for databases (or "indexed files") that need to seek to and modify small records inside a large file. (Again, see zfs(8).) Using small recordsizes unnecessarily on files that don’t have that access pattern reduces the opportunity for compression, and increases the overhead of the block-pointer walk (and overhead for storing all the block pointers).

So like many things, there is a trade off, and ZFS lets you choose. As far as latency impact of compression during a seek, it will only be the time required to decompress one block, independent of the size of the file.
I definitely have my reading work cut out for me. I'd never really thought about the subject before but I am curious as to how an lseek will be translated into a physical block number (if that's even how it works these days) on a compressed file, especially if the pre-compressed data is "lumpy" in terms of its compressability. I shall check out your pdf there once my brain has come online... whenever that might be.

Why all this hassle though? Why not make snapshots based on priority and base the incremental part on the timing of those snapshots?

Seems like a lot less hassle to me.
I'm not totally sure what you're referring to, but that's possibly due to me being hard of thinking after a bad night's sleep (a.k.a. "I slept": saying "badly" is a bit of a tautology). Certainly in terms of incrementals, that's in essence what I'm doing: each time I run my backup thing, it creates a new snapshot using today's date as a tag (if it's not already there) and I run an incremental update since the last snapshot on the backup. It's getting the original full copy of the data onto the backup in the first place that was flummoxing me; "don't use the -F flag" seems to be a good start and I'll experiment. Sigh, I really am being slow, considering I've only just realised I can experiment using a small, temporary test filesystem rather than waiting the best part of a fortnight for my /home filesys to trickle down my USB.

Talking of which, I really must find a better PCIe USB controller, but that really is getting off the point. Again. D:
 
I'd never really thought about the subject before but I am curious as to how an lseek will be translated into a physical block number (if that's even how it works these days) on a compressed file, especially if the pre-compressed data is "lumpy" in terms of its compressability.

The file is logically cut up into constant-sized blocks (pre-compression; size <= recordsize), which are then compressed and written to disk; the locations of these blocks are stored (except for very small files) in blocks of indirect block pointers (which themselves are always compressed, if I recall correctly, since they frequently have large regions of zeros). Irrespective of how compressible or variably-compressible the file is, the block pointer path to walk is wholly determined by the logical offset, the levels of indirection, and the data block size for that file.

From the second reference:
Note on Block Ids: Given a block id and level, ZFS can determine the exact branch of indirect blocks which contain the block. This calculation is done using the block id, block level, and number of block pointers in an indirect block. For example, take an object which has 128KB sized indirect blocks. An indirect block of this size can hold 1024 block pointers. Given a level 0 block id of 16360, it can be determined that block 15 (block id 15) of level 1 contains the block pointer for level 0 blkid 16360.
level 1 blkid = 16360%1024 = 15
This calculation can be performed recursively up the tree of indirect blocks until the top level of indirection has been reached.
 
vometia

Not willing to dismiss or neglect the knowledge of the present fellows on the ZFS subject, but if you are interested on in depth details about the ZFS implementation in FreeBSD you may want to cat allanjude@ on IRC ( Freenode ) or by e-mail. He is the guy for ZFS. :)
 
A rather belated reply but thank you for everything, including the subsequent replies: the compression/block ID thing is still confusing me, though that's not new. Often I just don't "get" something... until I do, at which point it's a bit of a case of, "oh yeah, why didn't I realise that before?" Anyway, I shall keep looking periodically until it sinks in.

I have now updated my backup script using the advice I've received here and reduced the size of the backed up data by around a third, which gives me a bit more breathing space. Still glacially slow thanks to the slightly rubbish USB controller on my MB but as mentioned that's a discussion for elsewhere!

Something else that may deserve its own topic but I figured I'd start here first: an assessment of my backup system overall. Could do with some input just to identify weak spots, potential pitfalls and other things I should watch out for. The fundamentals are that I'm using 2TB laptop drives in weatherproof USB cases. These are GPT partitioned to contain a boot area, a UFS area for full system boot should I need it, and also backups of other gubbins like my backup script, and the bulk of the drive that's a geli-encrypted partition containing a zpool. Why geli? Because the drives will not always be in a secure area (i.e. my house) in case it burns down or something and it's there; and why ZFS? Because it's handy for incremental backups, snapshots and is browseable whenever I need to do so.

I have several such drives and cycle through them on an erratic basis leaving them in assorted different locations.

Potential concerns: geli metadata backups. I need to keep these somewhere. Logically it may as well go on the UFS area of the same drive to avoid one thing getting lost and rendering the whole procedure an exercise in unrecoverable data. The only thing I'm not sure about is the security implications of doing so and I've been unable to find a definitive answer. AFAICT this can't be used to completely decrypt the drive without a passphrase, which is my concern; and I'm only using a passphrase for encryption rather than anything more exotic. This isn't national security level encryption, just something to keep out casual browsing and other nosiness should one of the drives sprout legs and go walkies.

My other concern is if the drive develops a bad sector given that they are only small and fragile laptop drives. I'm guessing this will result in some (hopefully minor) data loss but not total data loss, but I don't know the implications of a bad sector on a geli with a zpool on top of it. Anybody know?
 
You may be interested to know "ZFS native encryption" is expected to hit -CURRENT just after embargo be lifted, and should come out on 12.1-RELEASE ( however probably not fully tested yet ) .
 
Thanks! That may make life a bit easier*. I'm also thinking I should sit tight before finally doing what I should've done at least a decade ago and moved from CURRENT to STABLE: as previously mentioned, there were reasons but it's been a headache, rather unsurprisingly. But I guess before committing myself to a random act of randomness and moving a 2016-vintage CURRENT 12.0 to STABLE 11.2 or whatevs I should really wait just a bit longer for 12.1 to happen.

* well, ish: it will mean I need to rewrite my meandering backup script yet again! But I imagine having the two things integrated might improve resilience to attention from unwanted gremlins. Or, indeed your avatar, now that I think about it. :D
 
Back
Top