Investigating why restic runs "slow" without much apparent resource use or pressure

New to eyeballing `top` in freebsd, but facing a situation where nothing really sticks out, but still suspicious.

Running `restic`, which is performing its initial backup, and for that performs an initial read and checksumming of all files. Of course, it is possible that restic itself acts up, but I would like to know if that is something I should investigate, or there's some other more obvious sign in system resource usage that I'm missing.

Edit: maybe I missed something obvious - maybe it was doing upload to cloud storage interleaved with the file scanning, which I assumed would happen as a second step. Is there a tool I could use to get statistics about open connections and the transfer that happened on them? sockstat gives some info, but it doesn't seem to have a running sum of transfer (something I recall Linux's iotop had).

Code:
freebsd-version (and -k): 14.3-RELEASE-p7

Code:
last pid: 56398;  load averages:  0.04,  0.15,  0.11  up 0+14:36:20    01:12:56
48 processes:  1 running, 47 sleeping
CPU:  0.1% user,  0.1% nice,  0.1% system,  0.0% interrupt, 99.7% idle
Mem: 283M Active, 899M Inact, 18M Laundry, 6325M Wired, 223M Free
ARC: 5694M Total, 3951M MFU, 1687M MRU, 1677K Anon, 22M Header, 32M Other
     5387M Compressed, 7935M Uncompressed, 1.47:1 Ratio
Swap: 2048M Total, 2048M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
47595 root 16 23 0 1748M 487M uwait 5 2:17 5.76% restic

Code:
vmstat 5
 procs    memory    page                      disks       faults       cpu
 r  b  w  avm  fre  flt  re  pi  po   fr   sr ada0 ada1   in   sy   cs us sy id
 0  0  0 7.6G 314M   51   1   0   0   86   37    0    0  143  830 1.2k  0  0 99
 0  0  0 7.6G 314M    2   0   0   0    0  107    0   21 1.2k 3.9k 6.0k  0  0 99
 0  0  0 7.6G 314M    0   0   0   0    0  105    0   17  951 2.6k 4.5k  0  0 99
 0  0  0 7.6G 314M    0   0   0   0    0  105    0    0  393  743 1.6k  0  0 99
 0  0  0 7.6G 304M    3   0   0   0    0  105    6   56  676 2.6k 4.4k  1  0 98
 0  0  0 7.6G 304M    0   0   0   0    0  104    0   15  400  978 2.1k  0  0 99
 0  0  0 7.6G 304M    0   0   0   0    0  105    0   16  584 2.0k 2.8k  0  0 99

`gstat` shows mostly zero IO activity. `lsof -p`-ing restic shows the same few files open for many seconds, which seems a sign of slow processing. But CPU is almost unused. `top` io mode (m) shows VCSW of restic in 1K-3K range.

On one hand would have assumed I'm a bit short on RAM, on the other hand I don't see any swapping going on.. thank you for any insights!
 
Is there a tool I could use to get statistics about open connections and the transfer that happened on them?
dtrace, then post-process the output with a simple script. The gold standard of performance evaluation.

Another option: Use a profiler to sample your program frequently (like 100 times per second), and measure what operation it is in right now. Educated guess: given that it is a backup program, it is either reading from disk (a lot), writing to disk (hopefully less), or using the network. It might be doing multiple of those in parallel. Measuring what fraction of wall clock time is spent in those three main categories is likely to identify the bottleneck.

Anecdote: My homebrew backup program takes pretty accurately 270-280 seconds for its hourly run. It is nearly completely CPU limited, because it is written in Python, single threaded, and uses Sqlite as a database. Most of its hourly runs, it backs up de-facto nothing. On the runs where it backs up something (or a lot), it quickly becomes IO limited. I could speed that part up by a factor of roughly 2 by making reading and writing go in parallel (they are to different disk drives), but I haven't felt the urge to invest the 2 or 3 weekends into it. A rewrite in a fast compiled language (Rust) was attempted, and abandoned, after a week or two of learning rust failed to make me productive and happy. And I'm not going to do it in C++, life is too short for that.
 
Back
Top