Setup ZFS on occasionally detached external drive?

Hello. First of all, I hesitated to post this question because I asked a similar one about FAT32 earlier this week. I was concerned that repeating the question might lead to a dismissive reply suggesting I haven’t done my research.

I found a YouTube video by Garry where he formats some external hard drives with ZFS, but I'm still not sure.


I have a 750GB spinning hard drive connected via USB to my computer. I want to use it to store music files that will only exist on this drive. My computer has a 240GB SSD running FreeBSD with ZFS.

Is it possible to format the external hard drive with ZFS if this drive will occasionally be removed from the USB port? In ZFS terms, I need a striped format, right?

Also, if the external hard drive has a folder named "Music" containing all the albums, would it be better to make that folder a dataset? I ask this because at some point, I would like to have an identical copy of that folder on another USB-connected hard drive in the future as a backup, though I don't have that disk yet. In the past, I used to do transfer with rsync, but I'm thinking this might not be necessary with ZFS.

I would appreciate any suggestions.
 
[…] Is it possible to format the external hard drive with ZFS if this drive will occasionally be removed from the USB port?
Absolutely. (Thread 72174, …)​
In ZFS terms, I need a striped format, right?​
In ZFS terms you create one pool with one disk “virtual device” part of it (see zpoolconcepts(7)). The term stripe is generally associated with handling of multiple disks, so it is a bit weird calling a single disk in a single‑disk setup a stripe (although technically not incorrect).​
Also, if the external hard drive has a folder named "Music" containing all the albums, would it be better to make that folder a dataset? I ask this because at some point, I would like to have an identical copy of that folder on another USB-connected hard drive in the future as a backup, though I don't have that disk yet. In the past, I used to do transfer with rsync, but I'm thinking this might not be necessary with ZFS. […]
Familiarize with the zfs-send(8)/zfs-receive(8) workflow. You need snapshots for that, which can only be made of data sets. If you retain the last snapshot you synchronized from, you can send an incremental difference, i. e. just copy data that has changed.​
 
You can certainly format the USB drive for ZFS. It will be a standalone pool, not explicitly associated with your main drive, so there won’t be any striping or parity to configure. Just be sure to export the pool before detaching the drive. Some USB drives like to aggressively go to sleep (which can make ZFS unhappy if it takes too long to spin back up) when inactive — there are any number of approaches to preventing that if it becomes an issue.

I would certainly recommend creating a dataset for your music directory. In general, it’s best to avoid putting data at the root level of a pool, because it removes some flexibility down the road.

If you have another (zfs) drive that you’ll be using for back up in the future, you can certainly use ZFS to send/receive and then later incremental updates. This will be much faster than rsync, especially if you have a large number of files.
 
Trivial to do this. Roughly, as root (assuming the device shows up as /dev/da0):

# destroy any existing partitioning
gpart destroy -F /dev/da0
# create a GPT scheme
gpart create -s gpt /dev/da0
# create a single partition, using all the drive aligned at 1M boundary label "extdrive" this lets the drive show up at /dev/gpt/extdrive
gpart add -a 1m -l extdrive -t freebsd-zfs /dev/da0

#set the ashift, not needed if using 13.x or higher
sysctl vfs.zfs.min_auto_ashift=12
# create a zpool named "extdata" using the gpt label
zpool create extdata gpt/extdrive
#create a dataset named Music on the external drive default mountpoint is something like /extdata/Music
zfs create extdata/Music
# change the mountpoint if desired
zfs set mountpoint=/Music extdata/Music

You can also adjust permissions to give your user access to read and write directly instead of trying to use send/receive semantics.

As Eric A. Borisch points out before detaching the drive as root:
zpool export extdata

Normal shutdown (not laptop suspend) should do the right thing, normal startup should also do the right thing as far as mounting the device.
 
Thank you for your help, I really appreciate it. I will try to always export before disconnecting the drive, but in case of a power outage, what should I do?

My external USB caddy might be putting the hard drive into standby mode, actually, I think all of them do. I need to check how to turn that off.
 

Attachments

  • adata.jpg
    adata.jpg
    590.8 KB · Views: 41
You can easily put ZFS encryption on it too, nice for an external drive.

zfs create -o encryption=on -o keyformat=passphrase storage/secret

And after you've exported it and want to import it again, do

zpool import –l storage
 
If you are in the middle of writing to the device, you obviously lose data. I think the ZFS default transaction group interval is 30 secs so potentially 30 secs of the last written. The transaction goup is basically a list of data to be written.
If the device not being written to, it should be consistent internally so poweron it should be fine.
If the caddy is putting the device into standby, that should be fine, access to it may be "slow" until it wakes up; I've seen that on a couple external drives I have.
 
You can easily put ZFS encryption on it too, nice for an external drive.

zfs create -o encryption=on -o keyformat=passphrase storage/secret

And after you've exported it and want to import it again, do

zpool import –l storage

That sounds good, although these disks will always be in my possession. Does that encryption have any performance impact on the computer?
 
I would recommend using geli encryption unless you have the use case of ZFS-sending encrypted backups to an untrusted third party. (And so long as FreeBSD is the only user.)

And timeouts can cause zfs to offline the device, depending on how long it takes to spin-up. Consider a script that reads from the drive directly periodically to keep it awake.

A power loss can/will lead to lost data (when there is data written just before power loss, and not synced to disk) but ZFS is very robust against filesystem corruption.
 
The writings will be very rare. Once the content is on it, it will only be read from it.
Thank you. I appreciate your time. If I don't use encryption, is that a problem? Can it be done later?
 
If I don't use encryption, is that a problem? Can it be done later?
Native ZFS encryption I believe is on a dataset basis, so you'd need to create a new dataset with encryption on and then move files over. GELI would need to be "from the start"
 
I will try to always export before disconnecting the drive, but in case of a power outage, what should I do?
Nothing.

Seriously: Drives getting disconnected and reconnected happens all the time in the real world. ZFS handles it "just fine". If you look at the various zpool status commands while the drive is missing, you will see messages that indicate that the pool is degraded or unusable. Which may sound scary, but it is exactly the truth. When the drive is reconnected, that will clear itself. Doing the export/import cycle prevents those scary messages, but is technically not even necessary.

My external USB caddy might be putting the hard drive into standby mode, actually, I think all of them do. I need to check how to turn that off.
You can also leave it on, and have the drive go into standby mode. When there are no accesses (you are not reading or writing your music), ZFS won't even notice that the drive is off-line, because there are no IOs. The only risk is that the first IO that makes the drive turn itself back on fails and causes an error (for example because it takes so long to spin the drive up, something times out). That's annoying, but in and of itself not dangerous. You'll just get scary looking error messages and error statistics, which you have to learn to ignore.

Unlike some other systems, I don't think ZFS will keep using the drive when the foreground (user) workload is quiesced. That is, unless there is a long-running scrub operation, or unless the pool needs resilvering (which won't happen on a single disk pool). So ZFS won't prevent your drive from going into standby mode.

You can easily put ZFS encryption on it too, nice for an external drive.
That makes sense if (a) the data on it is valuable in the sense of not wanting to be leaked, and (b) there is risk of the drive falling into the wrong hands. But it also has a cost: A small complexity cost (having to remember the pass phrase and typing it in when reconnecting), and some performance overhead. And a small risk: forgetting the pass phrase. If the OP doesn't need it, don't use it.
 
Yes, encryption isn't needed. I have a lot of audio CDs that I want to rip to FLAC. I'm thinking about using ZFS because it offers good data protection. The FLAC utility has a checksum option, but it only covers the audio data. If bit rot occurs, recovery is only possible with a tool like PAR2. However, using PAR2 would require adding an extra 20% to 30% of the album's size for each album. At the end of the day, ZFS would be better. I'm on a tight budget and can't afford a NAS right now, but I'll use another drive to occasionally send incremental backups until I can get a NAS. Thank you for any suggestions. You guys helped me immensely. I really didn’t expect such a high level of support.
 
I'm thinking about using ZFS because it offers good data protection. The FLAC utility has a checksum option, but it only covers the audio data. If bit rot occurs, recovery is only possible with a tool like PAR2. However, using PAR2 would require adding an extra 20% to 30% of the album's size for each album. At the end of the day, ZFS would be better.
To make sure it's clear, ZFS on a single drive can detect bit rot, and it will tell you which files are corrupt, but it still needs extra copies of the data to repair it. ZFS datasets have a copies= setting that will store extra copies for recovery, but obviously that requires more space.
 
To make sure it's clear, ZFS on a single drive can detect bit rot, and it will tell you which files are corrupt, but it still needs extra copies of the data to repair it. ZFS datasets have a copies= setting that will store extra copies for recovery, but obviously that requires more space.

Yes, sir. You're right. I have this in mind. The data won’t stay on a single drive. It will also be saved on a second ZFS disk. The only thing is that these disks won’t always be connected, but I will keep them synchronized. I will never write new data to the first disk until the second one won't receive a copy of it as well. I mean: write new data to the first disk, zfs send, write new data to first disk, zfs send, and so on. Otherwise, the drive will be used only for reading, which accounts for 90% of its usage. Writing new data will be less frequent.
 
To make sure it's clear, ZFS on a single drive can detect bit rot, and it will tell you which files are corrupt, but it still needs extra copies of the data to repair it.

FWIW it doesn't just tell you the file is corrupted, it effectively deletes it. The file is still visible in its directory, but any attempt to access it results in an I/O error.
 
Back
Top