Write performance on SSD with ZFS

Maybe a silly question, but here it goes. I noticed that sometimes writing to SSD is quite slow. So I copied about 31GB of image files 3 times in a row, wiping the data before new copy. Measured overall times:

Code:
8:28 10:56 11:51

So every new copy got slower and slower. I used rsync from remote server to copy and rm to wipe the data.

Here is the ZFS configuration, I used all the defaults:

Code:
$ zpool get "all" zroot
NAME   PROPERTY                       VALUE                          SOURCE
zroot  size                           236G                           -
zroot  capacity                       55%                            -
zroot  altroot                        -                              default
zroot  health                         ONLINE                         -
zroot  guid                           14436446622398518569           -
zroot  version                        -                              default
zroot  bootfs                         zroot/ROOT/default             local
zroot  delegation                     on                             default
zroot  autoreplace                    off                            default
zroot  cachefile                      -                              default
zroot  failmode                       wait                           default
zroot  listsnapshots                  off                            default
zroot  autoexpand                     off                            default
zroot  dedupratio                     1.00x                          -
zroot  free                           106G                           -
zroot  allocated                      130G                           -
zroot  readonly                       off                            -
zroot  ashift                         0                              default
zroot  comment                        -                              default
zroot  expandsize                     -                              -
zroot  freeing                        0                              -
zroot  fragmentation                  17%                            -
zroot  leaked                         0                              -
zroot  multihost                      off                            default
zroot  checkpoint                     -                              -
zroot  load_guid                      7033948333118886069            -
zroot  autotrim                       off                            default
zroot  compatibility                  off                            default
zroot  bcloneused                     0                              -
zroot  bclonesaved                    0                              -
zroot  bcloneratio                    1.00x                          -
zroot  feature@async_destroy          enabled                        local
zroot  feature@empty_bpobj            active                         local
zroot  feature@lz4_compress           active                         local
zroot  feature@multi_vdev_crash_dump  enabled                        local
zroot  feature@spacemap_histogram     active                         local
zroot  feature@enabled_txg            active                         local
zroot  feature@hole_birth             active                         local
zroot  feature@extensible_dataset     active                         local
zroot  feature@embedded_data          active                         local
zroot  feature@bookmarks              enabled                        local
zroot  feature@filesystem_limits      enabled                        local
zroot  feature@large_blocks           enabled                        local
zroot  feature@large_dnode            enabled                        local
zroot  feature@sha512                 enabled                        local
zroot  feature@skein                  enabled                        local
zroot  feature@edonr                  disabled                       local
zroot  feature@userobj_accounting     active                         local
zroot  feature@encryption             enabled                        local
zroot  feature@project_quota          active                         local
zroot  feature@device_removal         enabled                        local
zroot  feature@obsolete_counts        enabled                        local
zroot  feature@zpool_checkpoint       enabled                        local
zroot  feature@spacemap_v2            active                         local
zroot  feature@allocation_classes     enabled                        local
zroot  feature@resilver_defer         enabled                        local
zroot  feature@bookmark_v2            enabled                        local
zroot  feature@redaction_bookmarks    enabled                        local
zroot  feature@redacted_datasets      enabled                        local
zroot  feature@bookmark_written       enabled                        local
zroot  feature@log_spacemap           active                         local
zroot  feature@livelist               active                         local
zroot  feature@device_rebuild         enabled                        local
zroot  feature@zstd_compress          enabled                        local
zroot  feature@draid                  enabled                        local
zroot  feature@zilsaxattr             disabled                       local
zroot  feature@head_errlog            disabled                       local
zroot  feature@blake3                 disabled                       local
zroot  feature@block_cloning          disabled                       local
zroot  feature@vdev_zaps_v2           disabled                       local


Code:
$ zfs get "all" zroot
NAME   PROPERTY              VALUE                  SOURCE
zroot  type                  filesystem             -
zroot  creation              Mon Mar  6 21:47 2023  -
zroot  used                  130G                   -
zroot  available             98.8G                  -
zroot  referenced            96K                    -
zroot  compressratio         1.40x                  -
zroot  mounted               yes                    -
zroot  quota                 none                   default
zroot  reservation           none                   default
zroot  recordsize            128K                   default
zroot  mountpoint            /zroot                 local
zroot  sharenfs              off                    default
zroot  checksum              on                     default
zroot  compression           lz4                    local
zroot  atime                 off                    local
zroot  devices               on                     default
zroot  exec                  on                     default
zroot  setuid                on                     default
zroot  readonly              off                    default
zroot  jailed                off                    default
zroot  snapdir               hidden                 default
zroot  aclmode               discard                default
zroot  aclinherit            restricted             default
zroot  createtxg             1                      -
zroot  canmount              on                     default
zroot  xattr                 on                     default
zroot  copies                1                      default
zroot  version               5                      -
zroot  utf8only              off                    -
zroot  normalization         none                   -
zroot  casesensitivity       sensitive              -
zroot  vscan                 off                    default
zroot  nbmand                off                    default
zroot  sharesmb              off                    default
zroot  refquota              none                   default
zroot  refreservation        none                   default
zroot  guid                  17429719224973091146   -
zroot  primarycache          all                    default
zroot  secondarycache        all                    default
zroot  usedbysnapshots       0B                     -
zroot  usedbydataset         96K                    -
zroot  usedbychildren        130G                   -
zroot  usedbyrefreservation  0B                     -
zroot  logbias               latency                default
zroot  objsetid              54                     -
zroot  dedup                 off                    default
zroot  mlslabel              none                   default
zroot  sync                  standard               default
zroot  dnodesize             legacy                 default
zroot  refcompressratio      1.00x                  -
zroot  written               96K                    -
zroot  logicalused           176G                   -
zroot  logicalreferenced     42.5K                  -
zroot  volmode               default                default
zroot  filesystem_limit      none                   default
zroot  snapshot_limit        none                   default
zroot  filesystem_count      none                   default
zroot  snapshot_count        none                   default
zroot  snapdev               hidden                 default
zroot  acltype               nfsv4                  default
zroot  context               none                   default
zroot  fscontext             none                   default
zroot  defcontext            none                   default
zroot  rootcontext           none                   default
zroot  relatime              on                     default
zroot  redundant_metadata    all                    default
zroot  overlay               on                     default
zroot  encryption            off                    default
zroot  keylocation           none                   default
zroot  keyformat             none                   default
zroot  pbkdf2iters           0                      default
zroot  special_small_blocks  0                      default

For comparison I did the same experiment with Fedora 39 and BTRF:

Code:
8:08 8:14 8:33

My question is: why copying on FreeBSD gets so much slower and can I do anything to improve that?
 
I can see that even your Fedora test gets a little bit slower, although by not that much. Maybe this is something caused by the SSD itself (perhaps when it gets hotter, its speed decreases).
Try to repeat the test like 20 times and see if there is a plateau.

SSDs perform better when you write and read to them in parallel, compared to HDDs. If the operation is not sufficiently parallel, this might be suboptimal, actually underutilize the SSD throughput.

My suggestion would be to try and copy in parallel some large files - maybe have a couple of 1 GB files and then run a script to copy them all in parallel? Then you can test the result on FreeBSD and Fedora and we would have more to hang our hats on?

In summary, it may be that FreeBSD is a little bit less parallel in the I/O operations than Fedora (or ZFS than BTRFS) and this could explain the slightly slower copy.
 
One candidate explanation is timing.

After big writes the SSD needs considerable time to get back to normal.

Maybe you left more time between Fedora runs?
 
Thank you for all the suggestions. There were no gaps between writes in both cases. Write-wipe-repeat. SSDs are different models but with similar read/write speeds. Fedora is installed on Kingston while FreeBSD on Crucial. Maybe Kingston rulez and Crucial sucks for this kind of silly tests :)

I'll keep an eye on disc performance while doing my regular work...
 
Well, I would have expected the slowdown to be even bigger than what you see on FreeBSD :)

But maybe whatever disk cache (TLC or whatever) wasn't full at 31 GB on both SSDs. Do you have the exact model designations?
 
Maybe a silly question, but here it goes. I noticed that sometimes writing to SSD is quite slow. So I copied about 31GB of image files 3 times in a row, wiping the data before new copy. Measured overall times:
That raises a couple of questions.
Code:
8:28 10:56 11:51
That's not very significant.

What kind of filesystem? What kind of alignment?

So every new copy got slower and slower. I used rsync from remote server to copy and rm to wipe the data.
Does the filesystem promote rm to trim?

Code:
zroot  autotrim                       off                            default
Probably you want to switch this one on. Mine are switched on for SSD, and "on" is shown as "default":
Code:
$ zpool get all im | grep autotrim    # SSD
im    autotrim                       on                             default
$ zpool get all bm | grep autotrim   # mechanical
bm    autotrim                       off                            default
Ups - just noticed: not all of my SSD pools are "on" - it seems it doesn't always get it automatically...

But then also: What Kingston? What Crucial?

In general: I am currently restructuring my pools. The behaviour of consumer SSD on bulk-data (like zfs revc), i.e. large sequential write, is a creepshow.
Brand-new SSD: it bulks in almost 50 GB at peak, stable 485 MB/sec. Wow. Then it changes to 18 MB/sec, with periodic short pulses of speed every 20 sec.(*)

As I mentioned, when my build engine compiles three llvm in parallel, it may push more than 100 MB/sec into swap. There are some SSD where this does not work well at all, and get badly stuck. There are other SSD where it works just fine and smooth. There is not really much difference between these - not in the specs (because you don't get any useful specs anyway), not in the technology (as far as can be determined), not very much in the pricing, and not at all in the test reports (I don't know what they're testing, but it doesn't relate to practical operation). Creepshow.

(*) These things can nowadays auto-reconfigure their flash cells from SLC to TLC or QLC. So maybe the behaviour comes because, at first, the piece fills its entire capacity in fast SLC mode, and then, when it's full, starts with normal operation. As a strategy to give excellent results in test cases (similar to the diesel emission scandal). Wouldn't surprize me...
 
Make sure to enable trim on your ssd filesystems. Without trim, deleted data will need to be "erased" before it is written again. If supported by the underlying filesystem, the filesystem which supports trim will send a trim request to the ssd erasing the just logically freed block, so that next time a write is made to that address there is no wait for the ssd to erase before data is written.

UFS and ZFS support trim. FreeBSD swap also supports trim at boot. EXT4 on Linux supports trim. XFS on linux may require you to run fstrim occasionally. Word has it that Linux XFS fstrim is slow.

Note: don't run the FreeBSD trim command on a partition or disk. The trim command has no knowledge of the underlying filesystem. It simply erases *everything*.
 
Note: don't run the FreeBSD trim command on a partition or disk. The trim command has no knowledge of the underlying filesystem. It simply erases *everything*.
Yeah, don't do it on a used one! It will just entirely delete that whole filesystem. But do it always on a no longer used partition, before deleting the partition.
 
Probably you want to switch this one on. Mine are switched on for SSD, and "on" is shown as "default":
That really depends on the RELEASE you are running, it would show as 'on' while it's really 'off' -- you need to explicitly set it to 'on'. See PR 264234, should be fixed in 14.0 (and likely needs to be MFCed to stable/13).
 
  • Thanks
Reactions: PMc
Wow, thanks for the notice - but this actually appears to be a mess... ;) Let's see:

Code:
# zpool get autotrim
NAME    PROPERTY  VALUE     SOURCE
backup  autotrim  off       default
bm      autotrim  off       default
build   autotrim  on        default
dbappb  autotrim  off       default
dbb     autotrim  off       default
dbintb  autotrim  off       default
gr      autotrim  off       local
ib      autotrim  off       default
ibd     autotrim  off       default
icol    autotrim  off       default
idb     autotrim  off       default
ig      autotrim  off       default
im      autotrim  on        default
lt      autotrim  off       default
media   autotrim  off       default
# sysctl -a | grep autotrim_bytes_written
kstat.zfs.icol.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.ibd.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.ig.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.build.misc.iostats.autotrim_bytes_written: 21879767040
kstat.zfs.ib.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.media.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.lt.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.im.misc.iostats.autotrim_bytes_written: 420148240384
kstat.zfs.idb.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.gr.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.dbintb.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.dbb.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.dbappb.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.bm.misc.iostats.autotrim_bytes_written: 0
kstat.zfs.backup.misc.iostats.autotrim_bytes_written: 0

It seems, at least where it shows "on", it does something... (13.2 RELEASE)
 
Thank you for all the suggestions. There were no gaps between writes in both cases. Write-wipe-repeat. SSDs are different models but with similar read/write speeds. Fedora is installed on Kingston while FreeBSD on Crucial. Maybe Kingston rulez and Crucial sucks for this kind of silly tests :)

I'll keep an eye on disc performance while doing my regular work...
Well, if you use different devices from different manufacturers, the benchmark is not valid. Maybe it is the hardware that behaves differently and not the OS (performance specs on paper can be quite different from reality).
A valid test would be to run the benchmark on the same hardware, varying only the OS. Otherwise the OS might have nothing to do with the difference you are observing.
 
Back
Top