Why st_dev is so large?

renatoalencar · Dec 11, 2023

I've been playing around with FreeBSD, and realized the the dev ID for my files are quite large. I've been trying to poke around the kernel's source code but couldn't find the reason why.

Here's an example

Code:

15744191626382938589 11347 drwxr-xr-x 21 renato renato 18446744073709551615 52 "Dec 11 08:32:30 2023" "Dec 11 12:49:12 2023" "Dec 11 12:49:12 2023" "Dec 11 08:32:30 2023" 131072 17 0x800 .

15744191626382938589 is a 64 bit number, which breaks some assumptions on a project code I'm testing.

I'd just like to understand why that happens, and if it should happen.

SirDice · Dec 12, 2023

stat(2)

Code:

COMPATIBILITY
       Previous	versions of the	system used different types  for  the  st_dev,
       st_uid, st_gid, st_rdev,	st_size, st_blksize and	st_blocks fields.

If I recall correctly this change happened with 12.0, several variables got changed from 32 bit to 64 bit.

renatoalencar · Dec 12, 2023

I noticed that should be 64 bit, from sys/_types.h + sys/types.h definitions. I didn't understand why my system returns such a high value from the local storage device, from devfs it gives a fairly lower value tho.

For context, this is breaking the tests for the cpython posix library. And that's because it returns a very high number that crosses the signed positive 64 bit boundary. Following sys/_types.h, this should really be an uint64 instead. Change this fixes the bug, alongside another type changes.

But I'd like to know where that number comes from. From what I've understood, the major number is very large, it can't even fit on a regular signed 32 bit integer. I'd like to have a deeper understanding of this.

- https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L937
- https://github.com/python/cpython/blob/main/Lib/test/test_posix.py#L696-L698

ralphbsz · Dec 12, 2023

Let me guess: The file you're looking at is on a ZFS file system?

The problem here is more fundamental. In the old days, one file system used exactly one disk, and each disk was used by (at most) one file system. (In the above sentence, the term "disk" really means block device.) And the stat structure was born in those old days, where it made logical sense to identify file systems by the numeric ID of their disk. For the numeric ID of their disk, the major/minor number (= dev_t) was perfect.

Why did this make sense? The only test that one can usefully do with the st_dev variable is to check whether two file system objects (typically files) are on the same file system, which for example is necessary when deciding whether to hardlink them to each other, or whether a file can be rename(2)'ed to another file without copying it. So the only operation one should do with st_dev is to compare two of them for equality.

But today we don't live in this simple world any more. Modern file system software (such as ZFS) no longer has a 1-to-1 correspondence between disk drive and file system. For example, the home directory of my server is physically stored on disks /dev/ada2p1 and /dev/ada3p8(it is mirrored), which have device numbers 0x98 and 0xb2, but because I use GPT labels, ZFS finds them under /dev/gpt/hd1[46]_home which has device numbers 0xa0 and 0xcb. And in ZFS, one pool (which corresponds to a set of block devices, which are then parts of physical disks) can contain multiple file systems, so it wouldn't even work to construct a fake device ID, for example by concatenating the ones of the physical disks. Today, the file system ID has a m-to-n relationship with the device ID of the disks. The solution to this is that ZFS (and other such file systems) have to create virtual (that is: fake!) st_dev numbers. It so happens that ZFS chooses very large 64-bit numbers for st_dev. On your machine it happens to have the highest bit set; on my machine, it happens to be 3876826178434374726 for the home file system (which is a little smaller, and doesn't happen to have the highest bit set).

If the cython unit tests have a problem with "negative" 64-bit st_dev numbers (only negative because they're interpreting a uint_64 incorrectly), the problem is with the unit test.

renatoalencar · Dec 12, 2023

> Let me guess: The file you're looking at is on a ZFS file system?
YES! Ealier today I realized I've forgot to mention that. And also noticed the device id would change from partition to partition.

> It so happens that ZFS chooses very large 64-bit numbers for st_dev. On your machine it happens to have the highest bit set; on my machine, it happens to be 3876826178434374726 for the home file system (which is a little smaller, and doesn't happen to have the highest bit set).
That makes a lot of sense!

> If the cython unit tests have a problem with "negative" 64-bit st_dev numbers (only negative because they're interpreting a uint_64 incorrectly), the problem is with the unit test.
Yeah, it's definitely interpreting them incorrectly.

covacat · Dec 12, 2023

C:

        /*
         * The fsid is 64 bits, composed of an 8-bit fs type, which
         * separates our fsid from any other filesystem types, and a
         * 56-bit objset unique ID.  The objset unique ID is unique to
         * all objsets open on this system, provided by unique_create().
         * The 8-bit fs type must be put in the low bits of fsid[1]
         * because that's where other Solaris filesystems put it.
         */

cracauer@ · Dec 12, 2023

It is unfortunate that it is unsigned. I would have preferred to use just 55 bits for the id and never use the sign bit.

One reason is interoperability with other languages and their bindings, such as shown up here.

renatoalencar · Dec 13, 2023

> One reason is interoperability with other languages and their bindings, such as shown up here.

This shouldn't be a problem for Linux. Although it looks like OpenBSD still uses 32 bit device IDs:

src/sys/sys/_types.h at master · openbsd/src

Read-only git conversion of OpenBSD's official CVS src repository. Pull requests not accepted - send diffs to the tech@ mailing list. - openbsd/src

github.com

Why st_dev is so large?

Administrator