Other How to determine if SSD activity LEDS are due to a Read or a Write

eepete · Nov 5, 2024

I've just finished a build of a server system that uses SSDs instead of "spinning rust". I have a system I made which runs MySql in a ram disk. rc.d files restore the last dump in a ramdisk on boot, and on system take-down an incremental dump of any tables that changed is stored on the SSD.

I've move all the log files for my software and FreeBSD system logs to another mounted Ramdisks( var/log/ramlog in my case). There are 6 Ramdisks mounted for all sorts things.
While discussing "why" and "is this a good idea" would be fun, the application is deployed systems in vehicles for response times in environments with none to intermittent communications. When running, database commits are between 10 and 100 per second. A custom power supply is integrated into the system to help coordinate boots and take-downs, and everything runs on the vehicle battery. The same software runs on a "big" server, which is what I'm workin now. A fun problem I'm happy to discuss once I clear this stumbling block.

With all my various daemons in crontab turned off, I still see disk activity lights on the server flash about every 4 seconds. My IoT and web systems are not active, there should be nothing dealing with the database. I dd not see this SSD activity LED flashing on this server's "digital twin" which is still running V13. But I see it in the new server running 14.1
When I run 'top', and look at IO (the 'M' command), I can look at it for a long period of time and see no READ or WRITE counts for anything.

Questions:
1) If there was any disk activity to the SSDs, would I see it in TOP ?
2) Is there any way to figure out when the disk activity lights flash what sort of transaction was done ? I really miss some of the early Ethernet based routers I worked on in the 90's where I had LEDs for Link, Xmit, Receive and Collision....
3) Is there any log or other info that I could store in my /var/log/ramlog to figure this out? Because it's a Ramdisk, it will not cause any SSD disk activity LEDs to flash (I've tested this, it's true).

FreeBSD 14.1-STABLE
mysql Ver 8.4.2 for FreeBSD14.0 on amd64 (Source distribution)
Apache24 Server version: Apache/2.4.62 (FreeBSD)
PHP Version => 8.3.11

richardtoohey2 · Nov 5, 2024

Would the SSD be doing any internal firmware-driven tidying up/scanning? Would that make the lights blink if it was that?

13.x was the nvme driver, 14.x is nvd; not sure if any difference there that would cause the behaviour to change (not sure I've got the exact wording right - there's nda, nvd, nvme!)

nvme(4)

man.freebsd.org

nvd(4)

man.freebsd.org

Exact same SSD make & model?

(Sounds like an interesting project!)

VladiBG · Nov 5, 2024

Try with

systat -> :help -> :iostat or :vmstat or :zarc in case fo zfs
gstat
vmstat
iostat

mer · Nov 5, 2024

I think "ZFS or UFS" is a question to answer. They both have some characteristics of "delayed write" (softjournaling, TXGroups) that try to bundle writes together before writing them out. The writing out may occur over time instead of one big write that potentially blocks out other stuff (this part is a bit of speculation on my part).

lgrant · Nov 5, 2024

Would DTrace help with this? I'm still learning DTrace, so by no means am I an expert, but it seems like a probe in the right place might be able to tell you what kind of activity you are dealing with.

VladiBG · Nov 5, 2024

dwatch(1) have predefined profiles it's using DTrace like io profile.

DTrace/One-Liners - FreeBSD Wiki

eepete · Nov 5, 2024

Good things to look at, thanks to all.
More system data:
The system runs trim once a week.
The current platform and its digital twin are using 3 SamSung 860 Pro 4 TB SATA 2-bit MLC drives. All 3 are redundant so total storage is 4 TB, with "spare space" that I've allocated, that's about 3.6 TB available storage. The server is a Supermicro 1019-WTR.
File system is ZFS (of course)
Processor is Xeon Gold 16 core 6246R 3.4 GHz 35MB L3 cache, 6 memory lanes
96 GB 3900 MHz ECC ram

The mobile server is a 8 core Atom 2.2 GHz with 64 GB of ECC Ram, two memory lanes, 15 MB L3 with a SamSung 970 EVO Plus 2 TB MLC3. The atom only supports 2 PCIe lanes, as does the 970. Nice to see Intel put ECC on a smaller processor. Sad that the mother board only supports 2 lanes into the M2 stick. It has a custom power supply that can keep this server up even when the vehicle battery voltage drops to 6.4 V when cranking (boost-buck design). A small uP on the supply talks to the server processor to make sure that power up and power down work and the Ramdisk based database is flushed before taking the system down. The power supply is key to this project, it was a "Fun" design. The power supply and it's hard wired connection to the battery make for a smart UPS that can control the system boot and take-down while keeping it informed about the environment and advising to the Unix system that it will be going down in (for example) 5 minutes so that users are warned.
It is critical to me that the exact same system/sofware works on both platforms, with just minor tweaks to the Ramdisk location sizes. No one likes to support two complex systems that are almost the same... The mother board for this server is the Supermicro A2SDI-8C. All this info just for fun and FYI.

For obvious reasons, software follows hardware changes. It's interesting to see modern operating systems evolve to allow for SSDs. We're moving from "You can read and write forever, but, it only moves and spins for so long" to "You can read forever, but only write so many times." In what is in essence a mobile IOT and Web Based system, the ruggedness and low power of an SSD is very attractive.

I will investigate your suggestions, thank you so much. If it is allowed, I can attach a picture of the mobile server. But I want to stay on topic here. Two interesting but separate discussions to have would be the mechanics of a Ramdisk based database and (more "fun" than unix focused) a bit more on the mobile server.

cracauer@ · Nov 5, 2024

Code:

iostat da0 2

Will display read and write amounts (separately) for /dev/da0 every 2 seconds.

eepete · Nov 5, 2024

cracauer@ said:
Code:

iostat da0 2

Will display read and write amounts (separately) for /dev/da0 every 2 seconds.

thanks so much for that. The three SSDs are ADA0,1 and 2. Here's what I see:

Code:

# iostat ada0 5
       tty             ada0             cpu
 tin  tout KB/t   tps  MB/s  us ni sy in id
   0     9  8.7    16  0.14   0  0  0  0 100
   0     9  9.5    15  0.14   0  0  0  0 100
   0     9  9.3    16  0.14   0  0  0  0 100
   0     9  8.2    16  0.13   0  0  0  0 100
   0     9 10.0    15  0.14   0  0  0  0 100
   0     9  8.9    15  0.13   0  0  0  0 100
   0     9 10.6    15  0.16   0  0  0  0 100
   0     9  9.6    15  0.14   0  0  0  0 100

About 12 MB per day, or about 4.5 GB per year. On a 3.6 TB sized SSD, and with the MLC-2 meaning you can write every block 1200 times, and with the understanding there will be other normal files storage actives going on, this is not a big problem. A bit more concerning with the 1 TB MLC-3 and it's 300 write times, but not overly so. And if any make can come out with a 2 TB M2 that's a MLC-3 then it's no problem. (on the mobile devices, there is 1TB for map tiles, and 1TB for general use).

The blink period on the SSD activity LEDs is every 5 seconds. Perhaps this will ring a bell with developers. What changed fromFreeBSD13 to 14 that would write to the storage every 5 seconds ? ZFS was running on both versions.

Thanks again, this really helps.

Phishfry · Nov 5, 2024

I dunno if anything changed with LED's but I want to call attention to the driver of leds. AHCI.
Take a look at /dev/led for the LED's exposed to the Operating System.

For Arm boards it is gpioled(4)

marq · Nov 5, 2024

Building upon cracauer@'s suggestion, it looks like it's also possible to see the breakdown of reads versus writes by adding -x to iostat, but annoyingly that outputs a 2-line header before every interval's data, making it difficult to read IMHO.

Try this:

iostat -d -x ada0 1 | head -2 ; iostat -d -x ada0 1 | egrep -v --line-buffered 'extended device statistics|^device'

(the part before the semicolon outputs the 2-line header once, the part after the semicolon outputs "extended" 1-second data continuously while filtering superfluous headers)

ralphbsz · Nov 5, 2024

Phishfry said:
I dunno if anything changed with LED's but I want to call attention to the driver of leds. AHCI.

The disk activity LEDs for SAS and SATA disks are typically not driven by the host operating system, but either by the disk interface (HBA for SAS), or by the disk itself (the disk's power connector has a pin for driving the activity LED). The latter is commonly used in JBODs or disk enclosures, the former in desktop cases. Because there is typically only one LED, or one per disk, it often has to do double duty with "disk present" or "power on for the disk".

It is possible to re-program either the disk adapter or the disk itself to handle the activity light differently. Some brands/models of disk drives by default have the LED programmed to turn on when power is present, and to blink when the disk is active. But on systems that have many disks (hundreds), the user typically doesn't want to see a hundred green LEDs on when the system is idle; they want to see which disks are busy. So by either loading different firmware into the disk, or by using vendor-specific configuration commands, it is possible adjust the behavior of the LED. So for the OP's concern, it would theoretically be possible to get the detailed technical manual for the disk drive and disk interface chip, and configure them so only write operations make the LED turn on.

In practice, the iostat command is much more useful.

eepete · Nov 6, 2024

M F said:
Building upon cracauer@'s suggestion, it looks like it's also possible to see the breakdown of reads versus writes by adding -x to iostat, but annoyingly that outputs a 2-line header before every interval's data, making it difficult to read IMHO.

Try this:

iostat -d -x ada0 1 | head -2 ; iostat -d -x ada0 1 | egrep -v --line-buffered 'extended device statistics|^device'

(the part before the semicolon outputs the 2-line header once, the part after the semicolon outputs "extended" 1-second data continuously while filtering superfluous headers)

Shows this:

Code:

                       extended device statistics  
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b  
ada0           0      78      0.0    827.6     0     0     1     0    0   1 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0      73      0.0    596.2     0     0     1     0    0   1 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0      70      0.0    758.5     0     0     1     0    0   1 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0       0      0.0      0.0     0     0     0     0    0   0 
ada0           0      82      0.0    874.3     0     0     1     0    0   1

Now it's time to figure out just what is writing. After saving the Ramdisk based MySql, I'll start killing things off and see if I can find out what is writing. It is clearly sustainable. This is why the large 4 TB drives for a system that will have less that 40 GB of storage used. Should even be good with the 1 TB MLC 3 on the mobile server, it will see a much lower commute rate and file usage than the big hosted server.
I'll start with shutting down MySql, in case the ram based system has some file or config I missed. The apache, postfix (but it has it's own RamDisk for the message storage and the system is not on line), and then gets a little fuzzy...

eepete · Nov 6, 2024

ralphbsz said:
The disk activity LEDs for SAS and SATA disks are typically not driven by the host operating system, but either by the disk interface (HBA for SAS), or by the disk itself (the disk's power connector has a pin for driving the activity LED). The latter is commonly used in JBODs or disk enclosures, the former in desktop cases. Because there is typically only one LED, or one per disk, it often has to do double duty with "disk present" or "power on for the disk".

It is possible to re-program either the disk adapter or the disk itself to handle the activity light differently. Some brands/models of disk drives by default have the LED programmed to turn on when power is present, and to blink when the disk is active. But on systems that have many disks (hundreds), the user typically doesn't want to see a hundred green LEDs on when the system is idle; they want to see which disks are busy. So by either loading different firmware into the disk, or by using vendor-specific configuration commands, it is possible adjust the behavior of the LED. So for the OP's concern, it would theoretically be possible to get the detailed technical manual for the disk drive and disk interface chip, and configure them so only write operations make the LED turn on.

In practice, the iostat command is much more useful.

The SuperMicro 1019-WTR has 2 leds per drive bay. They are documented in the server manual as "Activity" and "Locate". How that gets mapped into the disk or FreeBSD is not clear. I've got the "Samsung Magician" app on my PC for a drive there, I'll look at that. As I said in my 1st post, I really miss "Read Write" LEDs for hardware. It was always amazing how much you could "see" what's going on with my PDP-8, then later on the PDP-6 and 10 with all the indicators on them.
When I did some 6809 systems, one of the 1st cards I made was a LED panel.

marq · Nov 6, 2024

eepete said:
Now it's time to figure out just what is writing. After saving the Ramdisk based MySql, I'll start killing things off and see if I can find out what is writing. It is clearly sustainable. This is why the large 4 TB drives for a system that will have less that 40 GB of storage used. Should even be good with the 1 TB MLC 3 on the mobile server, it will see a much lower commute rate and file usage than the big hosted server.
I'll start with shutting down MySql, in case the ram based system has some file or config I missed. The apache, postfix (but it has it's own RamDisk for the message storage and the system is not on line), and then gets a little fuzzy...

FWIW, I have a 14.1 system with 2.5 SATA SSD.. When my system is idle, I'm seeing less total writes than you are (typically double-digit or low triple-digit K writes when there are writes), but I am running UFS. I am willing to "quiesce" my system and capture iostat output for a few minutes, if you think that would help as a point of comparison.. I'm confident the standard daemons do enough filesystem access that there will be some writes, even if none of your applications are running.

Hmmmm, you could try setting the "noatime" mount option on your critical filesystems (e.g.: "mount -u -o noatime /"). I'm not sure if FreeBSD has changed behavior recently, but at least years-ago, FreeBSD used to update the atime on all accessed inodes -- just those atime updates can generate measureable filesystem metadata write-activity if one reads a lot of files regularly. You probably don't care if file access-times are updated, so I think there should be no down-side to using the noatime mount option.

marq · Nov 6, 2024

Regarding my previous post, I decided to benchmark atime vs noatime mount options, it was within 4% on #writes, so now I don't think the noatime mount option would provide significant reduction in writes for the idle-system case. Maybe with your particular workload, it might make more difference with your apps running..

mer · Nov 6, 2024

OP says ZFS so I believe the noatime is a property of the dataset or zpool.
zpool history will show pretty much everything ever done to a pool, most start out with the zpool create command. I'm not sure what the default is but one should be able to tell if atime=off for a pool or dataset.

SirDice · Nov 6, 2024

ZFS will show mostly writes to the disks themselves, reading of data will mostly come from ARC (memory) when the caches have been warmed up enough. So you will see comparatively a lot more writes than reads when you look at the actual disk activity.

cracauer@ · Nov 6, 2024

M F said:
Regarding my previous post, I decided to benchmark atime vs noatime mount options, it was within 4% on #writes, so now I don't think the noatime mount option would provide significant reduction in writes for the idle-system case. Maybe with your particular workload, it might make more difference with your apps running..

What dd you expect atime to do on an idle machine where nothing is opening files?

marq · Nov 6, 2024

cracauer@ said:
What dd you expect atime to do on an idle machine where nothing is opening files?

An idle system in multi-user shows a certain background-level of writes when measured with iostat, at least for >= 1 minute,

It occurred to me that one plausible explanation for some of these periodic background writes might be that any reads by any running processes (even ones that had already-open files) likely would cause every corresponding inode atime to be updated after any process reads from any file.

I just expected/wondered if a few reads when "apparently idle" could be turning into most of the background write activity me and the OP have been seeing. But after doing quick measurements of atime vs noatime, at least I didn't see a big difference, so I'm back to wondering if there is some other magic-bullet for the OP that will minimize unnecessary SSD writes..

eepete · Nov 6, 2024

I have atime set to "off" on my system, been like that size I installed FreeBSD/ZFS.
I'll be investigate more to try to figure out what's writing.
It is interesting that TOP does not show the writes but iostat does. IDK if that provides insight on what to look for. I very much appreciate all the comments !

eepete · Nov 7, 2024

The offending program has been identified. It was one of 5 daemons I have that boot up via another rc.d file. I had some problems with the latest version of PHP functions to get the IP address, so I had hard-coded them into a daemon. It had the IP address of the digital twin, not the current server I'm working on. As such, attempts to create a socket failed because the IP address was wrong. The log file for that daemon was on the Ramdisk log, so I'm not really sure what was writing out to the SSD. I'll figure that out next. Going forward, I'm grabbing my machines IP from the /etc/rc.conf file. On of the oldest ways to mess things up: have duplicate config information in different places.... Have been working in Unix since the mid 70's and I'm behaving like it's rookie week.
The only other change I found to make in the system config was to move the pflog overt to the log file directory in the Ramdisk /var/db/ramlog

With this change done, I can have activity on the web based side and have outside systems both web based and socket based up and running and there are no writes to the SSD. I apologize for the stupid error, it's just so hard to make smart errors. However this does show that when using FreeBSD in an IoT environment with SSDS, it is possible to significantly reduce disk writes. The system needs to be either hosted with power redundancy or have the "smart UPS that talks to the server" when in a vehicle. The production server, when on line, is communicating with about 100 systems/users/endpoints. This results in about 50 database commites per second. On a major incident (a big structure fire) where there are 6 different agencies responding and a large number of responders and vehicles, I've seen 250 commites per second. The mobile server won't see this much, which is why the small Atom process works there.

On last system architecture comment, there is another MySql data base that is "traditional" and lives on the SSD. It holds the 80% of the total system data which rarely changes. The other 20%, about 80 MB worth, is in the SSD. As such, should there be a need in the system to have a safe place to store information such that if there was a software crash it would not be lost, there is a way to do that. Note also there are mechanisms where after making a change in the Ramdisk based MySql, a program can flush that table out to SSD right away. Lots of flexibility as different operations will have different requirements.

Thanks so much to every for the help. I'll use iostate if TOP indicates that nothing is going on. And I'll look for other duplicate configuration entries. It might be time for a .conf file for my system...

mer · Nov 7, 2024

Thanks for the update.

Other How to determine if SSD activity LEDS are due to a Read or a Write

Administrator