ZFS on Backup USB Drive

Ruler2112 · Aug 6, 2012

I just found a 2 terabyte USB hard drive on clearance for $70

and bought it for backups. I intend to back stuff up onto it, then disconnect it and lock it in a drawer.

My initial intention was to use a FAT32 file system on the drive so I could access it from windoze, unix, linux - whatever. However, it takes forever to mount, provides very little in the way of data safety, and doesn't support large files. (I have several files in the 50-100 gig range.) I used ZFS for my main hard drive and like the features it provides and was thinking of using it on this drive instead of FAT32. However, I have questions.

First, it receives /dev/da0 as the device, the same as my FAT32 flash drive (which cannot be changed). Aside from manually specifying the file system type, is there a way to make a specific device get a specific (different) /dev node when it's connected?

Second, how advisable is it to use ZFS in such a way? Would I be better served using something like reiserfs/ext3, or should I just stick with FAT32?

Third, how would I create the ZFS, mount, umount, etc? As indicated before, I know how to set up ZFS when installing my system using zpool, but I do not know how I'd umount/stop the drive so it's safe to disconnect and unplug.

Thanks in advance.

kpa · Aug 6, 2012

Use the pool name to import and export it.

Connect the disk.

# zpool import mybackup

Do the backup.

# zpool export mybackup

Unplug the disk.

FIlIPy65 · Aug 7, 2012

Well, you know that using ZFS, will not be so easy to access from Windows/Linux.

I don't have a USB Drive yet, but I've tested recently some filesystems which have transparent compression feature (ZFS, BTRFS and NTFS).
The test was on a Flash Drive, and mount/umount was manual.

The ZFS was very good, as well the NTFS. Looks like the BTRFS seems very experimental and I didn't get the expected results:

Code:

$ mkfs.btrfs /dev/sdc1

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

The tests weren't scientific or a benchmark, I just used the drive some days, carrying some data, watching read/write speed and CPU usage and always checking the compression ratio. =)

You can always go with ReiserFS, a great filesystem and you can use with no problems on Linux.
But, it's read only on FreeBSD. =|

Cheers.

Ruler2112 · Aug 7, 2012

kpa said:
Use the pool name to import and export it.

Connect the disk.

# zpool import mybackup

Do the backup.

# zpool export mybackup

Unplug the disk.

Worked perfectly - thank you!

Crivens · Aug 8, 2012

Also for the record:
I found that using ZFS and some other fs at the same time being a PITA because ZFS is (was?) not tied into the memory management as other file systems are. Copying data from an XFS or ext2 formatted disc to a pool resulted in good performance, untill the main memory was used by the inactive memory cache from the XFS or ext2 file system. Then the ARC got under pressure and performance tanked to floppy levels.

@ruler: This sounds like a used disc, is it? May I suggest running SMART tests on that thing every chance it gets and testing the backup every time? ZFS is good at catching transmission glitches from USB, but only if it gets the chance.

kpa · Aug 8, 2012

That's what the vfs.zfs.arc_max /boot/loader.conf tunable is for. Autotuning works otherwise fine on ZFS (on 8-STABLE or 9.0) but you still have to limit the maximum size of ARC cache manually.

Ruler2112 · Aug 8, 2012

Maybe I was too quick to call this one solved.

I started the copy yesterday and left it copying when I went home. It's still connected and the process shows as running, but cp is consuming 0.0% CPU time and it's stuck at 900 out of 988 gig in the first directory to copy. I tried Control-C to break out, tried killing the process, and finally kill -9 - nothing breaks out of the stalled out copy process. I can't export the pool because the device shows as busy.

Any ideas as to what happened or (better yet) how to fix it?

Crivens · Aug 8, 2012

What most likely happened is that the USB protocol had a packet dropped and it would need a reset of the connection state. Sources may be bad cabeling, driver issues or the disc may have problems.

@kpa: vfs.zfs.arc_max sets a maximum for the arc, but that is the opposite of the problem. The arc gets pushed out of existence and the performance of the pool is then absolutely abysmal. I would need to check with an arc-min setting if that fixes things. But in this regard I found ZFS a little bit on the lacking side compared with the caching performances of other file systems.

Ruler2112 · Aug 8, 2012

Crivens said:
@ruler: This sounds like a used disc, is it? May I suggest running SMART tests on that thing every chance it gets and testing the backup every time? ZFS is good at catching transmission glitches from USB, but only if it gets the chance.

Actually, it's brand new from Staples sealed in the package. Clearance is good.

Ruler2112 · Aug 8, 2012

Crivens said:
What most likely happened is that the USB protocol had a packet dropped and it would need a reset of the connection state. Sources may be bad cabeling, driver issues or the disc may have problems.

How would I go about doing this - simply unplug it? Since I'm doing the initial copy of the data, losing anything that's currently on it is not a big deal. However, I'd rather go with a different file system if this is going to be a recurring problem. I'll run a block test on the drive after I disconnect it from my main system to check it to be certain the disk is good.

Crivens · Aug 8, 2012

Ruler2112 said:
How would I go about doing this - simply unplug it?

Yes, but this is going to loose some data for sure when it is not known what caused the initial hang. BTW, do you use a separate power supply for the disc?

Ruler2112 said:
Since I'm doing the initial copy of the data, losing anything that's currently on it is not a big deal.

Well, if I would find any backup medium being faulty, no matter why, my trust into it would be somewhere below my trust into governments or con artists.

Finding out that the backup is damaged the hard way is something I would rather not do.

Ruler2112 said:
However, I'd rather go with a different file system if this is going to be a recurring problem. I'll run a block test on the drive after I disconnect it from my main system to check it to be certain the disk is good.

The file system has nothing to do with this, you can perhaps trigger these hangs by doing a dd from the raw device as well. That would be a test if the problem is in the file system or below it, somewhere in hardware or driver level.
In this case I would suggest sticking with ZFS as it has checksums for everything and stores important data more than one time, so the chances to detect a bad backup are good and the chances that a backup is damaged beyond use is also far better as if you would use ext2/3 or (yuk!) FAT.

Ruler2112 · Aug 8, 2012

Yes, separate power adapter for the drive. Connects with a weird double-width micro-USB cable. I tried doing a zpool export -f backup and it locked the console. I also tried doing a zpool status -v backup and got the following:

Code:

# zpool status -v backup
  pool: backup
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        backup      ONLINE      83     1     0
          da0       ONLINE      83     1     1

errors: Permanent errors have been detected in the following files:

        backup:<0x0>
        backup:/Images/<snip>

It listed a bunch of files, then locked that console as well. Funny thing is that it says the pool/device is online even though the drive is off/disconnected. Figuring I might as well start over, test the drive, and recreate everything, I did a zpool destroy backup to be greeted with:

Code:

# zpool destroy backup
cannot open 'backup': pool I/O is currently suspended

Same exact output when I do a zpool destroy -f backup. At least the console didn't lock on these commands, but it appears as though I cannot force it to release the pool so I can start over (or do anything else with it for that matter). If I plug in a flash drive with FAT32, I am able to mount/umount it successfully, so it appears to be something in the ZFS/zpool layer. Probably have to shut everything down and reboot to clear it? (Sigh... this is one of the reasons I like *nix OSs instead of windoze - you generally don't have to power off in order to clear errors.

)

Crivens · Aug 9, 2012

You may need to unload and reload the USB driver to clear this error. In case of GENERIC this means to reload the complete kernel (a.k.a. reboot). After that you may want to flatten the pool by filling the disc with random data.

External power adapter for USB discs are a must IMHO, but you may experience 50/60Hz noise from different phase/null mappings of the power adapters. In case of audio hardware you may know the 50/60Hz sound which goes away when you turn around the correct power adapter in the socket.
This may sound like voodoo, and maybe it is for several sets of hardware, but I have seen this working. Using an oscilloscope with a good pickup may show you the noise on the shieldings.

Ruler2112 · Aug 9, 2012

Crivens said:
You may need to unload and reload the USB driver to clear this error. In case of GENERIC this means to reload the complete kernel (a.k.a. reboot). After that you may want to flatten the pool by filling the disc with random data.

External power adapter for USB discs are a must IMHO, but you may experience 50/60Hz noise from different phase/null mappings of the power adapters. In case of audio hardware you may know the 50/60Hz sound which goes away when you turn around the correct power adapter in the socket.
This may sound like voodoo, and maybe it is for several sets of hardware, but I have seen this working. Using an oscilloscope with a good pickup may show you the noise on the shieldings.

The Firefox process wouldn't die on my system, even though the window had closed down last night. (Found this out when I tried to start it up this morning and it did nothing except complain that it was already running. Same behavior as the locked zpool commands above - even kill -9 failed to affect them) Turned out that no process would really end - window would just disappear. Because of this, I had to reboot this morning. Took quite a while, with BSD waiting a minute for every single running process to stop.

I now have the drive connected to a system with a Linux LiveCD, running badblocks -svn to do a non-destructive block test for every block on the drive; no idea how to do the same under BSD, or if it's even possible. Based on how big the drive is, I'd guess the scan should complete sometime close to 2016, but at least I'll know where I'm at with the hardware afterward. If it confirms the drive is good, I'll probably use something other than ZFS because I cannot have what happened here as a likely possibility for the backups I'm storing on the drive.

I *really* hate to lose the ability of storing big files, but so far FAT32 has been solid as a file system for backups at home & work, plus it's about as close as one can get to being universal across different OSs. I've read about how superior ZFS is, especially compared to a non-journalled FS like FAT, but the fact remains that I tried backing up data onto an external HDD using ZFS once and it failed once. Compare to the 100+ drives I have stuff backed up onto using FAT with not a single failure of this sort.

I don't know how to respond to the last part of your post, except I have only the vaguest idea of what you're talking about. I'm using the power adapter that came with the drive and it's polarized, so it can only plug in one way. I do not own, nor do I have the foggiest idea of how to use an oscilloscope or what it would tell me if I did have one.

Crivens · Aug 10, 2012

Ruler2112 said:
...Based on how big the drive is, I'd guess the scan should complete sometime close to 2016,

Let me guess, somewhere around the 30th of february?

Ruler2112 said:
I've read about how superior ZFS is, especially compared to a non-journalled FS like FAT, but the fact remains that I tried backing up data onto an external HDD using ZFS once and it failed once. Compare to the 100+ drives I have stuff backed up onto using FAT with not a single failure of this sort.

Let me point out one way to rephrase that : You may not get problems from other file systems because they do not detect problems.

But what you have on your hands here looks more like a hardware error or driver problem. There is one thing I hugely like on ZFS and which I would not want to miss in any case : the internal checksums for all and everything. Every disc I replaced on my storage machine was replaced because ZFS told me the thing was in trouble before SMART was able to say something. And all these faults were either self healing (copies >= 2) or easy to restore from other media (copies=1 for /storage/DVDs/...) And when it comes to backups, I want this. There is only data which is backed up with good redundancy and data which is about to be gone.

Ruler2112 said:
I don't know how to respond to the last part of your post, except I have only the vaguest idea of what you're talking about. I'm using the power adapter that came with the drive and it's polarized, so it can only plug in one way. I do not own, nor do I have the foggiest idea of how to use an oscilloscope or what it would tell me if I did have one.

What I meant is the wall socket. Pull the power adapter from the wall socket and turn it around. Depending on a lot of things which have to do with how the adapter is actually build, how is is connected, if the data lines are correctly discoupled and so on, you can have a steady 50/60 Hz noise on the digital lines which may play havoc from time to time. Best example for this is when you use external audio amplifiers and hear the noise, which goes away when turning the power cord in the wall socket.
An oscilloscope would make such signals visible, but you need a good one (not your sound card interface) and you need to know how to handle it. I do not have one myself, but I know where to go for one and lukily there I will find someone who can operate such equipment a lot better than me :stud .

Crivens · Aug 10, 2012

Oops, I just realized you may not be able to turn any connector in a wall socket if it has grounding. That is always possible with the connectors used in europe, but the US use some different geometry, do they?

disi · Aug 15, 2012

Ruler2112 said:
Maybe I was too quick to call this one solved.

I started the copy yesterday and left it copying when I went home. It's still connected and the process shows as running, but cp is consuming 0.0% CPU time and it's stuck at 900 out of 988 gig in the first directory to copy. I tried Control-C to break out, tried killing the process, and finally kill -9 - nothing breaks out of the stalled out copy process. I can't export the pool because the device shows as busy.

Any ideas as to what happened or (better yet) how to fix it?

Never ever use some basic program like cp to copy backups this large, I prefer rsync.
If it gets stuck it will retry a couple of times per default, copies permissions, owner etc. and shows the progress plus bandwidth

Code:

# rsync -avP /myfiles/* /mybackup/

Oh and in case it totally breaks up, it will not copy the same files twice depending on their size and access times.

Ruler2112 · Aug 16, 2012

In the US, plugs can be straight 2-prong, polarized 2-prong, or grounded 3-prong. The straight 2-prong may be plugged in either way - you can basically reverse polarity if you want. The polarized 2-prong have one prong slightly larger than the other so they can ONLY fit in one way as one hole in the outlet is slightly bigger. The grounded 3-prong also can go in only one way as the 2 prongs for + and - are flat and the ground prong is round.

As for the drive, as mentioned before I started running badblocks -svn on it through a linux live CD. It completed yesterday and shows 0 errors, so I don't know what to think. Maybe I'll try rsync instead of cp on ZFS - thanks for the tip.

The only thing that makes me hesitate at this point is that I've copied many terrabytes of smaller (1-2 gig each) files with USB on the same machine and FAT32 on the external drives (for cross-platform compatibility) using cp and never had a problem. Makes me wonder why, the first time I try ZFS on an external drive, there's a problem that basically took down my machine. (Unlike windoze, it's not easy to do with BSD.

)

Ruler2112 · Aug 17, 2012

I now believe that there's either an incompatibility between the USB controller on my system's motherboard and the chipset of the SATA->USB chip in the enclosure the drive came in or something quirky with ZFS on an external HDD like this. Yesterday, I left rsync running, backing up the aforementioned 988 gig in one directory of my main file system. (Main file system is 3 one terabyte drives running ZFS in raidz.) I got in this morning and instead of being in X windows, my system was sitting at a login prompt. At first I thought we'd lost power too long for my UPS to hold, but my other systems were unaffected. dmesg revealed the following:

Code:

(da0:umass-sim0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation cod
e)
<the above was repeated about 150 billion times>
panic: solaris assert: 0 == dmu_read(os, lr->lr_foid, off, dlen, buf, DMU_READ_NO_PREFETCH)
, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops
.c, line: 1072
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff805f4e0e at kdb_backtrace+0x5e
#1 0xffffffff805c2d07 at panic+0x187
#2 0xffffffff820a7e86 at zfs_get_data+0x1e6
#3 0xffffffff82087e98 at zil_commit+0x548
#4 0xffffffff820a035d at zfs_sync+0xcd
#5 0xffffffff8065431a at sync_fsync+0x16a
#6 0xffffffff806524be at sync_vnode+0x15e
#7 0xffffffff806527b1 at sched_sync+0x1d1
#8 0xffffffff805994f8 at fork_exit+0x118
#9 0xffffffff8089547e at fork_trampoline+0xe
Uptime: 7d10h53m31s
Cannot dump. Device not defined or unavailable.
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
cpu_reset: Stopping other CPUs
Copyright (c) 1992-2011 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011
    root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64

Given that this drive ran when connected to a laptop running the System Rescue CD Linux LiveCD for over 4 days and tested every single block of the drive, I can't believe that the drive is bad. (I am somewhat concerned that the SCSI police will come arrest me, as it did an illegal request... usually, it's women who complain about my doing that.

)

Any ideas?

Crivens · Aug 17, 2012

One idea would be to run the live cd on your desktop and see if the same hardware works with linux - this way it can be narrowed down somewhat.

Ruler2112 · Aug 17, 2012

Crivens said:
One idea would be to run the live cd on your desktop and see if the same hardware works with linux - this way it can be narrowed down somewhat.

Good idea, which I had considered. However, my system runs several processes that we can't have not working for an extended period of time, even over a weekend.

Crivens · Aug 17, 2012

Ruler2112 said:
However, my system runs several processes that we can't have not working for an extended period of time, even over a weekend.

.oO(torrents? Soap opera addicts?)

*SCNR*

You may give the liveCD some time if you can spare it, one or two hours should cover systematic hardware failures. Sporadic errors or other heisenbugs are something else and are a lot harder to nail down anyway - that would be the next step then.

Reading the free eBook from debuggingrules.com may pass the waiting time and also give some nice tipps as how to nail down some elusive problems

Ruler2112 · Sep 5, 2012

Crivens said:
.oO(torrents? Soap opera addicts?)

*SCNR*

You may give the liveCD some time if you can spare it, one or two hours should cover systematic hardware failures. Sporadic errors or other heisenbugs are something else and are a lot harder to nail down anyway - that would be the next step then.

Reading the free eBook from debuggingrules.com may pass the waiting time and also give some nice tipps as how to nail down some elusive problems

No, not torrents.

It's a script I wrote that must be available overnight when a process fires on a third party server. It saves us roughly 4 grand a year, but some critical data is lost if it's not available on the network. There are other scripts that fire at certain times of the week/month that save other employees literally days of work every quarter, but I can work around those pretty easily. System being up nightly is pretty critical though.

I ran the LiveCD on my primary system to eliminate possible hardware incompatibilities for 3-4 hours, but nothing abnormal happened. This problem has taken a back burner since I tried fixing a small problem and screwed up the box, but I think I'm going to need to revert my system to a previous OS state and thus need a reliable backup in the very near future in case something goes wrong.

I'm really fine with FAT32, except that it doesn't support large files. Maybe create 2 partitions and use something else on the second for the >4 gig files??? Suggestions of what would be read-write on BSD and preferably the same (or even read-only) on windoze?

Crivens · Sep 6, 2012

Ruler2112 said:
Suggestions of what would be read-write on BSD and preferably the same (or even read-only) on windoze?

Ext2 comes to mind. I think there was a driver around for it on windows, but you may also use NTFS3G.

Ruler2112 · Sep 14, 2012

Much to my dismay, I decided to go with NTFS for the backup drive, then had to figure out how to get ntfs-3g working on BSD. Installed the fusefs-ntfs port (which for whatever reason insisted on pulling in ruby) and was able to mount the drive RW, though it shows up as 'fusefs' after mounting. Using rsync to copy the files is working awesome, especially since it randomly stops. (Putting a load on my system tends to make it die more frequently, but it also fails overnight when I'm not anywhere near the system. It's always at different points and in different files.) Don't know if this is indicative of a drive problem, but given the excessive testing done to this point, I have to doubt it. Below is the error message that I've been getting.

Code:

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (61346 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]

Any ideas of what underlying condition this represents or how to fix it?

ZFS on Backup USB Drive

Ruler2112

kpa

FIlIPy65

Ruler2112

Crivens

Administrator

kpa

Ruler2112

Crivens

Administrator

Ruler2112

Ruler2112

Crivens

Administrator

Ruler2112

Crivens

Administrator

Ruler2112

Crivens

Administrator

Crivens

Administrator

disi

Ruler2112

Ruler2112

Crivens

Administrator

Ruler2112

Crivens

Administrator

Ruler2112

Crivens

Administrator

Ruler2112