C/C++ disk io

joske

Member

Reaction score: 7
Messages: 23

Hi,

I'm helping develop btop, a top, htop like program. I wrote the FreeBSD platform specific code (and the macOS specific code). I'm struggling with the disk io feature. Btop has a feature where it can show io ops/read/write per second per mountpoint. However, I can't find APIs to do this in FreeBSD. I found code for devstat(3), but it reports stats per block device, which would be good for 'normal' mountpoints, however the default installation uses ZFS. So you can't really map the device names you get in devstat to mountpoints you get from getmntinfo(3).

Can anyone suggest a way to do this?

The code for this is here:

See freebsd/btop_collect.cpp.

Cheers,
jos
 

grahamperrin

Son of Beastie

Reaction score: 1,006
Messages: 3,418

Cool!

Re: sysutils/bpytop I see <https://github.com/aristocratos/bpy...ebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R39>,

… third iteration of bashtop->bpytop. It's being written in C++ and will simply be called btop. …

… io ops/read/write per second per mountpoint. … ZFS …

Will the list of mount points be sortable?

Code:
root@mowa219-gjp4-8570p-freebsd:~ # mount | sort
/tmp on /compat/ubuntu/tmp (nullfs, local)
/usr/home on /compat/ubuntu/home (nullfs, local, noatime, nfsv4acls)
/usr/local/poudriere/data/.m/main-default/ref/rescue on /usr/local/poudriere/data/.m/main-default/01/rescue (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/data/.m/main-default/ref/usr/lib32 on /usr/local/poudriere/data/.m/main-default/01/usr/lib32 (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/data/.m/main-default/ref/usr/share on /usr/local/poudriere/data/.m/main-default/01/usr/share (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/data/.m/main-default/ref/usr/src on /usr/local/poudriere/data/.m/main-default/01/usr/src (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/data/.m/main-default/ref/usr/tests on /usr/local/poudriere/data/.m/main-default/01/usr/tests (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/data/.m/main-default/ref/var/db/ports on /usr/local/poudriere/data/.m/main-default/01/var/db/ports (nullfs, local, read-only)
/usr/local/poudriere/data/packages/main-default/.building on /usr/local/poudriere/data/.m/main-default/01/packages (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/data/packages/main-default/.building on /usr/local/poudriere/data/.m/main-default/ref/packages (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/jails/main/rescue on /usr/local/poudriere/data/.m/main-default/ref/rescue (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/jails/main/usr/lib32 on /usr/local/poudriere/data/.m/main-default/ref/usr/lib32 (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/jails/main/usr/share on /usr/local/poudriere/data/.m/main-default/ref/usr/share (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/jails/main/usr/src on /usr/local/poudriere/data/.m/main-default/ref/usr/src (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/jails/main/usr/tests on /usr/local/poudriere/data/.m/main-default/ref/usr/tests (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/ports/default on /usr/local/poudriere/data/.m/main-default/01/usr/ports (nullfs, local, noatime, read-only, nfsv4acls)
/usr/local/poudriere/ports/default on /usr/local/poudriere/data/.m/main-default/ref/usr/ports (nullfs, local, noatime, read-only, nfsv4acls)
/usr/ports/distfiles on /usr/local/poudriere/data/.m/main-default/01/distfiles (nullfs, local, noatime, nosuid, nfsv4acls)
/usr/ports/distfiles on /usr/local/poudriere/data/.m/main-default/ref/distfiles (nullfs, local, noatime, nosuid, nfsv4acls)
/var/cache/ccache on /usr/local/poudriere/data/.m/main-default/01/root/.ccache (nullfs, local, noatime, nfsv4acls)
/var/cache/ccache on /usr/local/poudriere/data/.m/main-default/ref/root/.ccache (nullfs, local, noatime, nfsv4acls)
august on /copperbowl (zfs, local, noatime, nfsv4acls)
august/ROOT/n250650-ef396441ceb-a on / (zfs, local, noatime, nfsv4acls)
august/VirtualBox on /usr/local/VirtualBox (zfs, local, noatime, nfsv4acls)
august/iocage on /copperbowl/iocage (zfs, local, noatime, nfsv4acls)
august/iocage/download on /copperbowl/iocage/download (zfs, local, noatime, nfsv4acls)
august/iocage/download/12.0-RELEASE on /copperbowl/iocage/download/12.0-RELEASE (zfs, local, noatime, nfsv4acls)
august/iocage/images on /copperbowl/iocage/images (zfs, local, noatime, nfsv4acls)
august/iocage/jails on /copperbowl/iocage/jails (zfs, local, noatime, nfsv4acls)
august/iocage/jails/jbrowsers on /copperbowl/iocage/jails/jbrowsers (zfs, local, noatime, nfsv4acls)
august/iocage/jails/jbrowsers/root on /copperbowl/iocage/jails/jbrowsers/root (zfs, local, noatime, nfsv4acls)
august/iocage/log on /copperbowl/iocage/log (zfs, local, noatime, nfsv4acls)
august/iocage/releases on /copperbowl/iocage/releases (zfs, local, noatime, nfsv4acls)
august/iocage/releases/12.0-RELEASE on /copperbowl/iocage/releases/12.0-RELEASE (zfs, local, noatime, nfsv4acls)
august/iocage/releases/12.0-RELEASE/root on /copperbowl/iocage/releases/12.0-RELEASE/root (zfs, local, noatime, nfsv4acls)
august/iocage/templates on /copperbowl/iocage/templates (zfs, local, noatime, nfsv4acls)
august/jails on /jails (zfs, local, noatime, nfsv4acls)
august/jails/12 on /jails/12 (zfs, local, noatime, nfsv4acls)
august/jails/13 on /jails/13 (zfs, local, noatime, nfsv4acls)
august/mkjail on /copperbowl/mkjail (zfs, local, noatime, nfsv4acls)
august/mkjail/12.0-RELEASE on /copperbowl/mkjail/12.0-RELEASE (zfs, local, noatime, nfsv4acls)
august/poudriere on /copperbowl/poudriere (zfs, local, noatime, nfsv4acls)
august/poudriere/data on /usr/local/poudriere/data (zfs, local, noatime, nfsv4acls)
august/poudriere/data/.m on /usr/local/poudriere/data/.m (zfs, local, noatime, nfsv4acls)
august/poudriere/data/cache on /usr/local/poudriere/data/cache (zfs, local, noatime, nfsv4acls)
august/poudriere/data/logs on /usr/local/poudriere/data/logs (zfs, local, noatime, nfsv4acls)
august/poudriere/data/packages on /usr/local/poudriere/data/packages (zfs, local, noatime, nfsv4acls)
august/poudriere/data/wrkdirs on /usr/local/poudriere/data/wrkdirs (zfs, local, noatime, nfsv4acls)
august/poudriere/jails on /copperbowl/poudriere/jails (zfs, local, noatime, nfsv4acls)
august/poudriere/jails/example on /usr/local/poudriere/jails/example (zfs, local, noatime, nfsv4acls)
august/poudriere/jails/main on /usr/local/poudriere/jails/main (zfs, local, noatime, nfsv4acls)
august/poudriere/ports on /copperbowl/poudriere/ports (zfs, local, noatime, nfsv4acls)
august/poudriere/ports/default on /copperbowl/poudriere/ports/default (zfs, local, noatime, nfsv4acls)
august/poudriere/ports/portoverlay on /copperbowl/poudriere/ports/portoverlay (zfs, local, noatime, nfsv4acls)
august/usr/home on /usr/home (zfs, local, noatime, nfsv4acls)
august/usr/ports on /usr/ports (zfs, local, noatime, nosuid, nfsv4acls)
august/usr/src on /usr/src (zfs, local, noatime, nfsv4acls)
august/var/audit on /var/audit (zfs, local, noatime, noexec, nosuid, nfsv4acls)
august/var/crash on /var/crash (zfs, local, noatime, noexec, nosuid, nfsv4acls)
august/var/log on /var/log (zfs, local, noatime, noexec, nosuid, nfsv4acls)
august/var/mail on /var/mail (zfs, local, nfsv4acls)
august/var/tmp on /var/tmp (zfs, local, noatime, nosuid, nfsv4acls)
august/vm-bhyve on /copperbowl/vm-bhyve (zfs, local, noatime, nfsv4acls)
devfs on /compat/linux/dev (devfs)
devfs on /compat/ubuntu/dev (devfs)
devfs on /dev (devfs)
devfs on /usr/local/poudriere/data/.m/main-default/01/dev (devfs)
devfs on /usr/local/poudriere/data/.m/main-default/ref/dev (devfs)
fdescfs on /compat/linux/dev/fd (fdescfs)
fdescfs on /compat/ubuntu/dev/fd (fdescfs)
fdescfs on /dev/fd (fdescfs)
fdescfs on /usr/local/poudriere/data/.m/main-default/01/dev/fd (fdescfs)
fdescfs on /usr/local/poudriere/data/.m/main-default/ref/dev/fd (fdescfs)
linprocfs on /compat/linux/proc (linprocfs, local)
linprocfs on /compat/ubuntu/proc (linprocfs, local)
linprocfs on /usr/local/poudriere/data/.m/main-default/01/compat/linux/proc (linprocfs, local)
linprocfs on /usr/local/poudriere/data/.m/main-default/ref/compat/linux/proc (linprocfs, local)
linsysfs on /compat/linux/sys (linsysfs, local)
linsysfs on /compat/ubuntu/sys (linsysfs, local)
procfs on /proc (procfs, local)
procfs on /usr/local/poudriere/data/.m/main-default/01/proc (procfs, local)
procfs on /usr/local/poudriere/data/.m/main-default/ref/proc (procfs, local)
tmpfs on /compat/linux/dev/shm (tmpfs, local)
tmpfs on /compat/ubuntu/dev/shm (tmpfs, local)
tmpfs on /tmp (tmpfs, local)
tmpfs on /usr/local/poudriere/data/.m/main-default (tmpfs, local)
tmpfs on /usr/local/poudriere/data/.m/main-default/01 (tmpfs, local)
tmpfs on /usr/local/poudriere/data/.m/main-default/01/.p (tmpfs, local)
tmpfs on /usr/local/poudriere/data/.m/main-default/01/usr/local (tmpfs, local)
tmpfs on /usr/local/poudriere/data/.m/main-default/ref (tmpfs, local)
tmpfs on /usr/local/poudriere/data/.m/main-default/ref/.p (tmpfs, local)
tmpfs on /usr/local/poudriere/data/.m/main-default/ref/var/db/ports (tmpfs, local)
root@mowa219-gjp4-8570p-freebsd:~ #

Might the utility also show I/O per device? Akin to gstat -op but as a graph.
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

Thanks for all suggestions. I'll summarize my understanding so far:
* with getmntinfo(3) we get the mounted filesystems - however, because it's zfs, we have no idea from which disk this is coming
* with devstat API we get the IO stats per *disk* (not per mountpoint/filesystem) : nvd0, ada1, ada2, ada3, cd0 in my case

So what's missing is to know how the (zfs) mountpoints map to the disks. Is there a sysctl to find out which disks are in the zpool?
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

When I run zpool status, it returns some strange disk id that has no resemblance to any device node:

[jos@bsd ~]$ zpool status zroot
pool: zroot
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
diskid/DISK-S21JNXAG659300Dp2 ONLINE 0 0 0
 

Alain De Vos

Son of Beastie

Reaction score: 869
Messages: 2,826

I just found a way:
Code:
sysctl -a kern.geom.conftxt | grep zfs
Can be used together with
Code:
zpool list -v
 

covacat

Daemon

Reaction score: 515
Messages: 1,040

you can get statistics directly per dataset, without caring what disks are underneath
kstat.zfs.ntank.dataset.objset-0x12f.nread: 829088057510
kstat.zfs.ntank.dataset.objset-0x12f.reads: 4146119331
kstat.zfs.ntank.dataset.objset-0x12f.nwritten: 99409239242
kstat.zfs.ntank.dataset.objset-0x12f.writes: 160826401
kstat.zfs.ntank.dataset.objset-0x12f.dataset_name: ntank/xdb
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

I just found a way:
Code:
sysctl -a kern.geom.conftxt | grep zfs
Can be used together with
Code:
zpool list -v
[jos@bsd ~]$ sysctl -a kern.geom.conftxt | grep zfs
2 PART diskid/DISK-S21JNXAG659300Dp2 78299267072 512 i 2 o 282624 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b

[jos@bsd ~]$ zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 72.5G 10.6G 61.9G - - 1% 14% 1.00x ONLINE -
diskid/DISK-S21JNXAG659300Dp2 72.5G 10.6G 61.9G - - 1% 14.7% - ONLINE

Still does not give me the physical disk, nor a way to link what devstat(3) gives me:
2021/11/16 (22:25:28) | DEBUG: dev nvd0 read=7202304 write=0
2021/11/16 (22:25:28) | DEBUG: dev ada0 read=1099911680 write=2285356032
2021/11/16 (22:25:28) | DEBUG: dev ada1 read=37595136 write=0
2021/11/16 (22:25:28) | DEBUG: dev ada2 read=5677568 write=0
2021/11/16 (22:25:28) | DEBUG: dev cd0 read=0 write=0
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

you can get statistics directly per dataset, without caring what disks are underneath
kstat.zfs.ntank.dataset.objset-0x12f.nread: 829088057510
kstat.zfs.ntank.dataset.objset-0x12f.reads: 4146119331
kstat.zfs.ntank.dataset.objset-0x12f.nwritten: 99409239242
kstat.zfs.ntank.dataset.objset-0x12f.writes: 160826401
kstat.zfs.ntank.dataset.objset-0x12f.dataset_name: ntank/xdb
This looks like it would work: the dataset name maps to the device we get back from getmntinfo(3). However, how can I call this sysctl from C++? What kind of input/output does it expect?
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

man 3 sysctl ?
does not help unfortunately. I see in the source of htop that they get directly to some arcstat value (this is just a number). But I have no clue on how to do the same for dataset values. It probably returns a pointer to some kind of struct, but can't find what. I also can't find anything with google, seems like no-one has done this before? :-/

When you search on kstat you find solaris stuff.
 

Alain De Vos

Son of Beastie

Reaction score: 869
Messages: 2,826

And something like
Code:
sysctl -a | grep kstat | grep dataset | egrep "dataset_name|nread" | sort
You can invent other queries
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

And something like
Code:
sysctl -a | grep kstat | grep dataset | egrep "dataset_name|nread" | sort
You can invent other queries
Yes, obviously I could fork sysctl and parse its output. But I'm first looking at a programmatic way to do this.
In the btop sources for linux, nearly everything is parsing the output of some command or reading from /sys or /proc. In the FreeBSD collect functions, *everything* so far is programmatic (using a sysctl or some kernel API call).

I thought this was one thing were the FreeBSD people make fun of linux? ;-)
 

covacat

Daemon

Reaction score: 515
Messages: 1,040

you can look at the source of sysctl(8) to see how to walk the tree and parse return values, get return types
 

Alain De Vos

Son of Beastie

Reaction score: 869
Messages: 2,826

Linux likes "filesystem trees". But people writing USB drivers for it, know it is not always the best idea.
For freebsd sysctl you must program a form of tree traversal, so it has the same tree alike complexity.
Normally you need to find out how many children are in a node and how to query them.
And this in a recursive alike way.
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

you can look at the source of sysctl(8) to see how to walk the tree and parse return values, get return types
I actually already did ;-). It only deals with leaf nodes as far as I understand it, so it does not need to care about which struct pointers to use, it only cares about primitive types. While I know for fact that many of the values returned by sysctl -a can be gotten programmatically via some mib/struct (I already call many of them in btop sources).
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

Linux likes "filesystem trees". But people writing USB drivers for it, know it is not always the best idea.
For freebsd sysctl you must program a form of tree traversal, so it has the same tree alike complexity.
Normally you need to find out how many children are in a node and how to query them.
And this in a recursive alike way.
True to some extend, however I think it's much nicer if you could get some array of structs (pointer to a struct + a length) back, and just walk that than to parse the output of sysctl. String handling in C is not fun.
 

covacat

Daemon

Reaction score: 515
Messages: 1,040

sysctl -a actualy walks all the mib tree and prints the values according to the detected type
 

Alain De Vos

Son of Beastie

Reaction score: 869
Messages: 2,826

I just had a look at /usr/src/sys/sys/sysctl.h
And it contains:
Code:
/* 
 * This describes one "oid" in the MIB tree.  Potentially more nodes can 
 * be hidden behind it, expanded by the handler. 
 */ 
struct sysctl_oid { 
    struct sysctl_oid_list oid_children; 
    struct sysctl_oid_list *oid_parent; 
    SLIST_ENTRY(sysctl_oid) oid_link; 
    int      oid_number; 
    u_int        oid_kind; 
    void        *oid_arg1; 
    intmax_t     oid_arg2; 
    const char  *oid_name; 
    int     (*oid_handler)(SYSCTL_HANDLER_ARGS); 
    const char  *oid_fmt; 
    int      oid_refcnt; 
    u_int        oid_running; 
    const char  *oid_descr; 
    const char  *oid_label; 
};
How to use it, i have totally no clue...
 

covacat

Daemon

Reaction score: 515
Messages: 1,040

so from mountpoint via statfs you can get the pool name and the dataset name
then walk sysctl kstat.zfs.<poolname> until you find a dataset name that matches yours
the extract writes , reads ,nwritten etc
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

I just had a look at /usr/src/sys/sys/sysctl.h
And it contains:
Code:
/*
 * This describes one "oid" in the MIB tree.  Potentially more nodes can
 * be hidden behind it, expanded by the handler.
 */
struct sysctl_oid {
    struct sysctl_oid_list oid_children;
    struct sysctl_oid_list *oid_parent;
    SLIST_ENTRY(sysctl_oid) oid_link;
    int      oid_number;
    u_int        oid_kind;
    void        *oid_arg1;
    intmax_t     oid_arg2;
    const char  *oid_name;
    int     (*oid_handler)(SYSCTL_HANDLER_ARGS);
    const char  *oid_fmt;
    int      oid_refcnt;
    u_int        oid_running;
    const char  *oid_descr;
    const char  *oid_label;
};
How to use it, i have totally no clue...
Indeed, this could be used, but then it's probably simpler to parse the output of "sysctl kstat.zfs.zroot.dataset".

Does anyone know if there's a higher level struct that describes the contents of entries like this (objset-0x...)?
 

covacat

Daemon

Reaction score: 515
Messages: 1,040

zfs get objsetid dataset returns that number

kstat.zfs.ntank.dataset.objset-0x46.dataset_name: ntank/fs
ntank/fs objsetid 70
70 is 0x46

but taking this path will make you link / include zfs stuff
if you don't already have them in your project you may better walk the mib
 

grahamperrin

Son of Beastie

Reaction score: 1,006
Messages: 3,418

Not in answer to the original question, but FYI:

When I run zpool status, it returns some strange disk id that has no resemblance to any device node: …

Below, my gpt/cache-august is the cache-august partition label in the GPT partitioning scheme:

Code:
% zpool status august
  pool: august
 state: ONLINE
  scan: scrub repaired 0B in 02:45:23 with 0 errors on Thu Sep 16 05:48:48 2021
config:

        NAME                STATE     READ WRITE CKSUM
        august              ONLINE       0     0     0
          ada0p3.eli        ONLINE       0     0     0
        cache
          gpt/cache-august  ONLINE       0     0     0
          gpt/duracell      ONLINE       0     0     0

errors: No known data errors
% gpart show
=>        40  1953525088  ada0  GPT  (932G)
          40      532480     1  efi  (260M)
      532520        2008        - free -  (1.0M)
      534528    33554432     2  freebsd-swap  (16G)
    34088960  1919434752     3  freebsd-zfs  (915G)
  1953523712        1416        - free -  (708K)

=>      34  60437425  da0  GPT  (29G)
        34  60437425    1  freebsd-zfs  (29G)

=>      34  32358333  da1  GPT  (15G)
        34      2014       - free -  (1.0M)
      2048  32356319    1  freebsd-zfs  (15G)

% gpart show -l
=>        40  1953525088  ada0  GPT  (932G)
          40      532480     1  efiboot0  (260M)
      532520        2008        - free -  (1.0M)
      534528    33554432     2  swap0  (16G)
    34088960  1919434752     3  zfs0  (915G)
  1953523712        1416        - free -  (708K)

=>      34  60437425  da0  GPT  (29G)
        34  60437425    1  cache-august  (29G)

=>      34  32358333  da1  GPT  (15G)
        34      2014       - free -  (1.0M)
      2048  32356319    1  duracell  (15G)

%

HTH

<https://www.freebsd.org/cgi/man.cgi?query=gpart&sektion=8&manpath=FreeBSD>
 
OP
J

joske

Member

Reaction score: 7
Messages: 23

I've implemented 'regular' filesystems using devstat API (a bit special as this gives per block device not per partition/mountpoint, so several mountpoints can match). And I've implemented the ZFS filesystems with sysctl output parsing.

Can anyone have a look if this makes any sense? See the collect_disk() method at line 499 here:
 
Top