Other Root drive read-only after boot

Greetings, all! We support lots of FreeBSD servers set-up to run as dual-root nanobsd installs -- mounted root filesystem is read-only. Filesystem, where configuration is persisted, is mounted read-write only temporarily, while changes are being written. That worked great for years. Have systems started as 10.4, went through many upgrades, now running 14.3. Anyway, not sure that have any connection with today's issue.

Well, after todays' upgrade (13.2 -> 14.3), root disk, where OS images are, turned out read-only. Not the filesystem; the disk itself. For example gpart(8) fails to create a new partition ( gpart add -t freebsd-ufs ada0) with gpart: geom 'ada0': Operation not permitted.

Here is a bit of info:
Code:
# freebsd-version -ku
14.3-RELEASE-p8
14.3-RELEASE-p8

# sysctl kern.securelevel
kern.securelevel: -1

# camcontrol devlist
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass0)
<KINGSTON SV300S37A120G 60AABBF0>  at scbus7 target 0 lun 0 (pass1,ada0)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus15 target 0 lun 0 (ses1,pass2)
<INTEL SSDPE2KE032T8 VDV10170>     at scbus16 target 0 lun 1 (pass3,nda0)
...

# diskinfo -v ada0
ada0
        512             # sectorsize
        120034123776    # mediasize in bytes (112G)
        234441648       # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        232581          # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        KINGSTON SV300S37A120G  # Disk descr.
        50026***        # Disk ident.
        ahcich6         # Attachment
        id1,enc@n306***/type@0/slot@1/elmdesc@Slot_00 # Physical path
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM
        Not_Zoned       # Zone Mode

# smartctl -a /dev/ada0
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.3-RELEASE-p8 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     KINGSTON SV300S37A120G
Serial Number:    50026***
LU WWN Device Id: 5 0026b7 66700d99f
Firmware Version: 60AABBF0
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database 7.3/5528
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb 13 16:57:37 2026 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7d) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  48) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0025) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/4646756
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   075   075   000    Old_age   Always       -       21968h+18m+19.490s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       74
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       71
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       0
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   029   041   000    Old_age   Offline      -       29 (Min/Max 15/41)
194 Temperature_Celsius     0x0022   029   041   000    Old_age   Always       -       29 (Min/Max 15/41)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/4646756
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/4646756
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/4646756
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0000   100   100   011    Old_age   Offline      -       4294967296
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       12
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       16
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       16
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       5
244 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      -       131072

SMART Error Log not supported

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

# dmesg | grep ada0
ada0 at ahcich6 bus 0 scbus7 target 0 lun 0
ada0: <KINGSTON SV300S37A120G 60AABBF0> ATA8-ACS SATA 3.x device
ada0: Serial Number 50026***
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 114473MB (234441648 512 byte sectors)
ses1: ada0,pass1 in 'Slot 00', SATA Slot: scbus7 target 0

# mount
/dev/ufs/cdnrootfs2 on / (ufs, local, read-only, reads: sync 3153 async 16, fsid 22258f69d9183802, vnodes: count 1650 )
devfs on /dev (devfs, fsid 00ff007171000000, vnodes: count 155 )
tmpfs on /etc (tmpfs, local, fsid 02ff008787000000, vnodes: count 1721 )
tmpfs on /var (tmpfs, local, fsid 03ff008787000000, vnodes: count 173 )
tmpfs on /srv (tmpfs, local, fsid 04ff008787000000, vnodes: count 296 )
storage/vol1 on /srv/vol1 (zfs, local, noatime, nfsv4acls, fsid da5b215ede9bb828, vnodes: count 244864 )
tmpfs on /srv/*** (tmpfs, local, fsid 05ff008787000000, vnodes: count 291 )
tmpfs on /srv/*** (tmpfs, local, fsid 06ff008787000000, vnodes: count 773 )
tmpfs on /srv/*** (tmpfs, local, fsid 07ff008787000000, vnodes: count 33 )
tmpfs on /srv/*** (tmpfs, local, fsid 08ff008787000000, vnodes: count 3 )
tmpfs on /srv/wwwcache.ram (tmpfs, local, fsid 09ff008787000000, vnodes: count 2 )
/dev/ufs/cdndata on /srv/data (ufs, local, read-only, reads: sync 3 async 0, fsid ec2a9464fe5d78f8, vnodes: count 1 )
/dev/ufs/cdncfg on /cfg (ufs, local, noatime, noexec, nosuid, read-only, synchronous, reads: sync 278 async 0, fsid ec2a9464494d2f4a, vnodes: count 326 )

# sysctl kern.cam.ada.0
kern.cam.ada.0.trim_ticks: 0
kern.cam.ada.0.trim_goal: 0
kern.cam.ada.0.sort_io_queue: 0
kern.cam.ada.0.rotating: 0
kern.cam.ada.0.unmapped_io: 1
kern.cam.ada.0.flags: 0x1be3bde<CAN_48BIT,CAN_FLUSHCACHE,CAN_NCQ,CAN_DMA,WAS_OTAG,CAN_TRIM,OPEN,SCTX_INIT,CAN_POWERMGT,CAN_DMA48,CAN_LOG,CAN_WCACHE,CAN_RAHEAD,PROBED,ANNOUNCED,DIRTY,PIM_ATA_EXT,UNMAPPEDIO>
kern.cam.ada.0.max_seq_zones: 0
kern.cam.ada.0.optimal_nonseq_zones: 0
kern.cam.ada.0.optimal_seq_zones: 0
kern.cam.ada.0.zone_support: None
kern.cam.ada.0.zone_mode: Not Zoned
kern.cam.ada.0.write_cache: -1
kern.cam.ada.0.read_ahead: -1
kern.cam.ada.0.trim_lbas: 8
kern.cam.ada.0.trim_ranges: 1
kern.cam.ada.0.trim_count: 1
kern.cam.ada.0.delete_method: DSM_TRIM

Server itself is working fine, apart that new version cannot be committed -- neither boot loader, nor other configuration changes can be saved.

ZFS is also in use, but on NVMes, not on this disk. Cannot see anything to keep the drive open with fstat(1). Can use some help right now :) How come this drive is read-only? Can you share some tips to help diagnose the problem? And thank you for your time!
 
It is not an issue of filesystem being read-only, but the drive itself (well, the SSD) being read-only. Attempting to mount a filesystem read-write (via mount -uw /, for example) also gives Operation not permitted.
 
Code:
# gpart show -lp
=>       40  234441568    ada0  GPT  (112G)
         40       1024  ada0p1  cdnboot  (512K)
       1064        984          - free -  (492K)
       2048    2097152  ada0p2  cdnrootfs1  [bootme]  (1.0G)
    2099200    2097152  ada0p3  cdnrootfs2  (1.0G)
    4196352     262144  ada0p4  cdncfg  (128M)
    4458496    2097152  ada0p5  cdndata  (1.0G)
    6555648  227885960          - free -  (109G)

=>       40  234441568    diskid/DISK-50026***  GPT  (112G)
         40       1024  diskid/DISK-50026***p1  cdnboot  (512K)
       1064        984                                  - free -  (492K)
       2048    2097152  diskid/DISK-50026***p2  cdnrootfs1  [bootme]  (1.0G)
    2099200    2097152  diskid/DISK-50026***p3  cdnrootfs2  (1.0G)
    4196352     262144  diskid/DISK-50026***p4  cdncfg  (128M)
    4458496    2097152  diskid/DISK-50026***p5  cdndata  (1.0G)
    6555648  227885960                                  - free -  (109G)

NVMe drives are not partitioned and fully "devoted" to the ZFS.
 
It is posted up there, but I saw no indication for that. I had past experience with Intel SSDs and they have very different behaviour -- I/O errors, dead-slow performance and tanking IOPS, in the 1-2MB/sec range. This seems more like some logical/kernel lock, not a hardware issue. Is there a tool I can see read/write status of a CAM device?
 
hm. boot the machine single-user. then do mount -uw / to see if it allows it there.

also you show a smart output, but it doesn't show any tests having been run. we'd run at least a short test to see if that updates the drive status.
 
There are some issues with that. The machine is 1) a production server, 2) I can only access remotely (via ssh, no console access), and 3) if restarted, it will revert and the boot from the other root filesystem (running older FreeBSD and whatnot), because boot disk being read-only, the bootloader configuration cannot be changed. I'm not in a position to reboot it freely. But I can run some "safe" tests and do some light diagnostics. Ignoring issue with read-only boot disk, the server is working fine.
 
ough.

yeah. smartctl -t short /dev/ada0 and then after a few minutes, smartctl -ax /dev/ada0

also, what's dmesg say? especially around the most recent boot? CAM likes to complain in the kernel messages.
 
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/4684334
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   075   075   000    Old_age   Always       -       21971h+00m+15.220s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       74
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       71
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       0
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   030   041   000    Old_age   Offline      -       30 (Min/Max 15/41)
194 Temperature_Celsius     0x0022   030   041   000    Old_age   Always       -       30 (Min/Max 15/41)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/4684334
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/4684334
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/4684334
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0000   100   100   011    Old_age   Offline      -       4294967296
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       12
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       16
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       16
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       5
244 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      -       131072

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     21970         -
Nothing important in kernel message buffer; there are no errors there at all.
 
Could be. Shouldn't there be I/O errors or any errors in the kernel ring buffer? Doesn't match to the behavior we've encountered before. Besides, as one can see, there weren't many writes to this SSD. Writes are needed only when 1) updating version (1GB of sequential writes) or 2) saving configuration changes (tens of KBytes).
 
This could be the GEOM()'s safety interlock doing its job (preventing certain low level disk operations from being performed in certain conditions). I'd study the manual page, especially the part about diagnostics.
 
everything we're seeing points to the SSD having a bad day and possibly reporting that it's a read-only device. unsure how to detect this from inside a running system, though. maybe camcontrol identify?
 
What does mount say if you manually mount the partition explicitly with mount -v -o rw ? I'd say it should at least mention some reason for getting no write access. I have no drive with this problem to test it.
1 similar situation that I remember was a read-only mount because a background fsck was still running due to a minor filesystem problem but you probably would have noticed that.
 
On UFS rootfs will mount read-only when there are filesystem errors. If fsck preen cannot fix them you will need to run fsck by hand.
 
It is a nanobsd system; root file systems are read-only by default (and by design). However it is a drive issue -- physical or logical, but there is no indication for either, apart from this "Operation not permitted". Same error if I try to modify the partition table, which should not have any connection with filesystems.
Code:
# mount -vuw /
mount: /dev/ufs/cdnrootfs2: Operation not permitted
/dev/ufs/cdnrootfs2 on / (ufs, local, read-only, reads: sync 3185 async 34, fsid 22258f69d9183802, vnodes: count 1657 )
 
I'm on team "your SSD is kaput" now. Do this (one command first, then the other one):

Code:
# sysctl kern.geom.disk.ada0.flags
# geom disk list ada0

If either command says WRITEPROTECT your SSD is most likely kaput and entered automatically the write-only mode as a fail-safe for data loss.
 
Code:
# sysctl kern.geom.disk.ada0.flags
kern.geom.disk.ada0.flags: be<OPEN,CANDELETE,CANFLUSHCACHE,UNMAPPEDBIO,DIRECTCOMPLETION,CANZONE>

# geom disk list ada0
Geom name: ada0
Providers:
1. Name: ada0
   Mediasize: 120034123776 (112G)
   Sectorsize: 512
   Mode: r3w0e4
   descr: KINGSTON SV300S37A120G
   lunid: 50026***
   ident: 50026***
   rotationrate: 0
   fwsectors: 63
   fwheads: 16
securelevel is -1.

Code:
borked# truss gpart add -t freebsd-ufs ada0
...
modfind("g_part")                                = 322 (0x142)
openat(AT_FDCWD,"/dev/geom.ctl",O_RDONLY,00)     = 3 (0x3)
ioctl(3,GEOM_CTL,0x3ace5f60a040)                 = 0 (0x0)
close(3)                                         = 0 (0x0)
mmap(0x0,1112,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 64658033741824 (0x3ace5f221000)
minherit(0x3ace5f221000,1112,INHERIT_ZERO)       = 0 (0x0)
getrandom("h\M-v\M^C\M-;X\M-LkH\240+g\M^U"...,40,0) = 40 (0x28)
openat(AT_FDCWD,"/dev/geom.ctl",O_RDONLY,00)     = 3 (0x3)
ioctl(3,GEOM_CTL,0x3ace5f60a000)                 ERR#22 'Invalid argument'
close(3)                                         = 0 (0x0)
gpart: write(2,"gpart: ",7)                              = 7 (0x7)
geom 'ada0'write(2,"geom 'ada0'",11)                     = 11 (0xb)
: write(2,": ",2)                                        = 2 (0x2)
issetugid()                                      = 0 (0x0)
fstatat(AT_FDCWD,"/usr/share/nls/C/libc.cat",0x33e70f4cfa70,0x0) ERR#2 'No such file or directory'
fstatat(AT_FDCWD,"/usr/share/nls/libc/C",0x33e70f4cfa70,0x0) ERR#2 'No such file or directory'
fstatat(AT_FDCWD,"/usr/local/share/nls/C/libc.cat",0x33e70f4cfa70,0x0) ERR#2 'No such file or directory'
fstatat(AT_FDCWD,"/usr/local/share/nls/libc/C",0x33e70f4cfa70,0x0) ERR#2 'No such file or directory'
Operation not permitted
write(2,"Operation not permitted\n",24)          = 24 (0x18)
exit(0x1)
process exit, rval = 1

vm# truss gpart add -t freebsd-ufs ada0
...
modfind("g_part")                                = 322 (0x142)
openat(AT_FDCWD,"/dev/geom.ctl",O_RDONLY,00)     = 3 (0x3)
ioctl(3,GEOM_CTL,0x49fd95c0a040)                 = 0 (0x0)
close(3)                                         = 0 (0x0)
mmap(0x0,1112,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 81353483882496 (0x49fd95821000)
minherit(0x49fd95821000,1112,INHERIT_ZERO)       = 0 (0x0)
getrandom("[\M-=\t};\M-k\M-w\M-9\^Y\M-;\^E"...,40,0) = 40 (0x28)
openat(AT_FDCWD,"/dev/geom.ctl",O_RDONLY,00)     = 3 (0x3)
ioctl(3,GEOM_CTL,0x49fd95c0a000)                 = 0 (0x0)
close(3)                                         = 0 (0x0)
fstat(1,{ mode=crw--w---- ,inode=106,size=0,blksize=4096 }) = 0 (0x0)
ioctl(1,TIOCGETA,0x27cda7602e84)                 = 0 (0x0)
da0p7 added
write(1,"da0p7 added\n",12)                      = 12 (0xc)
exit(0x0)
 
Back
Top