Solved /dev/ directory basically completely empty after 4-8 hours after I start an external drive (over USB) backup [zfs send/recv]

Hello all,

Over the weekend I was able to successfully migrate my 8 year old Debian 10 server to FreeBSD 12.1 RELEASE (Updated to p1 - latest patches as of the time that I'm writing this) and this have been working well for the most part. All of my critical services have been ported and are running properly, things such as Apache, Postgres, Synapse (Matrix), Plex, Syncthing, Etc. Originally the server was built back when I was a Gentoo Linux Developer and their ZFS maintainer, so this system was built as a ZFS on Linux box from the beginning. My backup strategy for the past 8 years (in relation to ZFS) consists of using some of my automated scripts to take specific snapshots and zfs send | zfs recv in an incremental fashion from one pool to another. The external pools are actually on two individual 1x 4 TB HDDs. Each of these HDDs are on an external USB 3.0 dock (Plug one in, import the pool, backup. Export, unplug, and plug the next one, etc). This strategy and the USB dock have been working great for many years. However, there was some weirdness happening on FreeBSD 12.1 RELEASE that I want to bring up, maybe someone can help me with it (And or maybe this is an actual bug in the kernel or hardware level).

Basically what I noticed was that after I start backing up my pool (Let's say this is a fresh backup of the entire operating snapshot and it's contents, so in my case we are speaking about an initial zfs send of about 1 TB of data into an external pool sitting on a USB 3.0 dock (with a 4 TB HDD), about 460 GB, the machines seems to stop operating. It's not a crash, but if I try to SSH into the box it will say "PTY allocation request failed on channel 0 stdin", if I turn on my monitor directly on the server and log in, doing any command such as zpool status or even uptime will fail with different messages. The core problem seems to be that for whatever reason, after backing up for that many hours, something is completely unmounting or deleting everything in the /dev directory (I'm aware that /dev for the most part is a virtual filesystem, which makes me wonder what is causing it to unmount the devpts mount). The only thing remaining seems to be /dev/null and maybe one more other device. Rebooting the machine brings back the server and the /dev directory is one again available.

Originally I was thinking that it was because I had tmux open with two panels, one with the zfs send command, and the other panel had gnu-watch (I also tried cmdwatch) doing a gnu-watch -n 0.5 "zfs list", I was thinking maybe since I was doing the watch so frequently, maybe leaving gnu-watch running over night at that frequency was causing something to leak (something that maybe affected the amount of ptys available?) and thus eventually causing the /dev dir to mess up. With that train of thought, I started running a few experiments each night over the past few days until I got the system no longer crashing (Which was what I mentioned above).

(Each of these runs I cleared the external drive completely - all partition labels, gpt, etc, and made a brand new pool)

1. First night was basically: zfs backup, tmux, gnu-watch running - result: /dev is gone at about 460 GB in, zfs snapshots stop taking snapshots since /dev is gone and it can no longer communicate with the hardware.
2. Second night I switched gnu-watch to cmdwatch since I thought it maybe the culprit. It also stopped exactly at 460 GB in.
3. Third night I eliminated cmdwatch as well and just left the backup and tmux running. It stopped at about 466 GB in (4-6 hours after I went to sleep)
4. Fourth night I didn't back up anything and just left tmux up. The goal was to see if my theory of the backing up over to the USB dock was the problem. This resulted in the system not crashing and so far it's been up 11 hours.

So it seems that for whatever reason, FreeBSD does not like my USB 3.0 dock. On Linux this works fine. I do know that some people recommend not using USB 3.0 devices for this purpose, I'm open to suggestions but my use case is simply to be able to easily take the entire pool on my home server, and easily be able to send it over to a single external drive that can fit all that data. Once that data is backed up, I want to be able to easily hot swap the drive and load up the next one, so I can have a total of 2 independent external backup drives that I can either keep either in a fire proof safe and/or another drive on an offsite location (Parents house? haha). Ideally I would like to resolve the USB 3.0/External dock issue since I think USB 3.0 will be the future in a lot of ways (and it's kinda already here), but I'm also open to using eSATA as well if I can find a good eSATA PCI controller for FreeBSD and also a good external dock that has eSATA as well - and that works well with FreeBSD, I would need the controller since my motherboard doesn't have an available eSATA port, but I will double check tonight).

The configuration for the server is as follows:

- 6x 1 TB HDDs. All of these are on RAIDZ2, total usable space is 3.5 TB
- swap is mirrored so crash dumps are apparently useless (Although the machine isn't hard crashing so I wouldn't see this. dmesg also doesn't seem to display any weird errors).
- External HDDs are each a 4 TB Seagate drive (That allows me to fit the entire pool onto each of these, and still have dual parity on the server itself)
- AMD FX 8120 8 core processor
- 8 GB of RAM

USB 3.0 Dock = INEO (I-NA32OU Plus)

When I have the dock powered on with a drive connected, boot up will be a bit slower since the boot up scripts will "wait till root device is ready" (when it's looking for the zfs root), once it figures out that the dock doesn't actually have the root os, it loads up the actual root pool and continues. Another thing I noticed is that there are messages that pop up when I first plug in the drive, it starts as:

`usbd_setup_device_desc: getting device descriptor at addr 1 failed, USB_ERR_TIMEOUT`

This pops up about 5 times and then it says that it couldn't allocate the new device. After that umass0 kicks in and after that it seems to work and mounts it as a /dev/da0 (Mentions that it's considering the device as a SCSI over Bulk-Only, I think this may also be contributing to the issue since I read about USB over SCSI and "UASP", and I think this message means that my current dock doesn't actually support UASP even though it's still USB 3).

Pictures

Last but not least, I think this may be useful info but not sure if this is a problem on BSD, but on this machine, on Linux, the kernel has had issues with devices that I connected to the USB 3.0 ports due to IOMMU issues. I've had to set some particular flags on the linux kernel boot up (Things such as setting IOMMU as passthrough) in order for devices on USB 3.0 ports to work properly. On BSD everything seems to be fine and the devices I plug into the USB 3.0 ports seem to work fine even if I didn't set any IOMMU specific settings.

Lots of information above, but if there is anything more I can provide, definitely let me know and I'll report back! I'm glad to be here and looking forward to being a contributing member of the community.

EDIT: At the moment I'm looking into purchasing an eSATA PCIe Controller with an ASM1062 chipset since it seems it is well supported in FreeBSD. It says it supports port replication but I'm not using it for that specifically, I will be plugging in a single drive over one of its eSATA ports at a time.

Either:

Ableconn PEX-SA130 2-Port eSATA III 6Gbps PCI Express Two Lanes Host Adapter Card - AHCI Port-Multiplier PCIe 2.0 x2 Controller Card - ASMedia ASM1062 Chipset

or

Ableconn PEX-SA134 4-Port eSATA III 6Gbps PCI Express Four Lanes Host Adapter Card - AHCI Port-Multiplier PCIe 2.0 x4 Controller Card

And for the external eSATA dock I'm thinking of going with the one below (and this is where I'm saying I'll be avoiding the port replication, primarily by just focusing on getting an external eSATA dock that supports only a single drive. From my understanding, I'm thinking the port replication will kick in between plugging in 2 or more drives on the eSATA dock itself, and those drives simultanously being connected and trying to transfer data over a single eSATA cable to its target (single) eSATA port on the controller, after that the command switching for the "supported port replication" should kick in. If it's just a single eSATA drive on a single cable, there should be no command switching since it will just be a single stream?

iDsonix SuperSpeed USB 3.0/2.0/eSATA to SATA Hard Drive Docking Station For 2.5/3.5in HDD/SSD Tool Free Design - Supports 4TB+HDD Premium 12V2.5A Power Adapter 3.3 Ft USB3.0/eSATA Cable included

Particularly because the above provides me the following:

- eSATA port (It doesn't say if it supports SATA III @ 6 Gbps speeds, it does say it is SATA III compatible, but there was another similar product from Orico that I liked, but that one said it's SATA III compatible at only 3 Gbps speeds which makes me sad :( haha - direct quote "Offers USB 3.0 and eSATA interfaces with data transfer rates up to 5Gbps and 3Gbps respectively, backward compatible with USB 2.0 and 1.1").
- Also has USB 3 w/ UASP support (So maybe that will help as well if I ever wanted to try USB 3 again)
- Tool Free which allows me to easily switch between both of my external HDDs to make sure each of them have an up-to-date pool.
- Provides dedicated power (DC)

This one [Vantec 3.5" SATA 6Gb/s to USB 3.0/eSATA HDD Enclosure (NST-366SU3-BK)] is also a strong candidate and I may end up going with it, the only downside (In this case) is that it's just a regular hard drive enclosure so in this case it will be a little more finicky for me to swap the backup drives because I'm primarily just using it as an adapter for each of these drives. But it does explicitly state that it supports SATA III @ 6 Gbps and also has UASP support)

EDIT 2:

I've purchased the Vantec enclosure since it is guaranteeing UASP and SATA III @ 6 Gbps. I can always purchase another "more easily swappable" enclosure after I backup my data to the external drives and feel better. Regardless, if this works, it will be a pretty good enclosure for single drives, I may even get another one just for the convenience. I'll be getting this tomorrow, so I can try to see what FreeBSD output says regarding "Bulk Only" and I can see if it will mess up /dev again.

I've also purchased the "Ableconn PEX-SA130 2-Port eSATA III 6Gbps PCI Express Two Lanes Host Adapter Card", I'll be getting this on Friday and I can test it as well (I believe this is compatible with my mobo, I'll open up the box and check, worst case scenario I can return all of these since Amazon usually has a pretty good return policy.)

EDIT 3:

The esata pci card and esata dock came in and I installed them. There were two tests that I ran. A USB test to see if now that this dock has UASP, maybe not using the whole Bulk Only stuff would get it to succeed, and the second test is an eSATA test.

The USB approach ended up crashing the /dev directory and application at around 465 GB (460-466 GB as my lowest and highest ranges, the dataset it is backing up is 469 GB). The only file left in /dev was 'null'.

I also noticed when I plugged in the USB dock/hdd, that FreeBSD actually detected this as a Bulk Only device as well, there was no reference to it using UASP over Bulk Only, as the protocol for communication. So I think there may be something wrong with FreeBSD's driver.

The eSATA method was going extremely fast, at a full 1 GB per 6 seconds (Hense SATA III being 6Gbps). However, after it got to 466 GB, it also crashed as well.

So at this point this is pretty bad since it has to be a low level driver. The eSATA controller is ASM1062 which is compatible with FreeBSD and other people have mentioned it was good. So I'm thinking there is something wrong with the kernel/zfs code. If I can't backup my data off of the raid array via zfs send/recv, I can't really use FreeBSD for my server's OS and will need to switch back to Linux.

EDIT 4:

Adding a picture of the error alongside the nuked /dev dir.


This person from a May 2003 email seems to have had a similar issue - https://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073600.html

EDIT 5:

As a temporary workaround, I was able to successfully back up all 654 GB (639G compressed on main pool) of data I have from my main system to the external zfs pool over rsync. I just can't use zfs replication to do it due to this issue.
 
Last edited:
I've made a few updates to the main OP to avoid multi-posting. I've hit a dead end though, please read EDIT 3 above.
 
So I'm an idiot. There is no bug. Basically on Linux I always used to do 'zpool create -N <>' or 'zpool create -R <>', or both, because I wanted to avoid mountpoint collisions. On BSD I didn't use those flags and just imported the backup pool directly. So this would make it so that when I sent my 'tank' datasets to 'backup', the /dev directory alongside everything else sitting there, would get collide with the OS side. So that explains why my /dev directory vanished, probably the OS freaked out. Once I used the above, I avoided the collision and the replication now succeeds. Also explains why rsync worked.. just copying it to the other side, no collisions.
 
Back
Top