getty status program

A long time ago, I wrote a program in python that would simply display the system status on a terminal such as tty1. It used colors (Red / Yellow / Green) and information in blue so I could quickly identify problems.

I haven't had a need for this for several years mostly since I offloaded wifi to a physical router instead of directly on the computer through a USB wifi adapter card. I was thinking about adding it back and wondering if others have something similar or how they check that the system is healthy?

My system is fairly stable and I periodically check the logs to ensure that overnight jobs ran. This is for home use, but try to run my systems in a controlled, repeatable manner.
 
From time to time have a look at /var/log/messages & /var/log/console & dmesg.
And run mate-system-monitor.
Question, what could make your system unhealthy , knowing it is not Windows.
 
I have a whole bunch of those, but they are all pretty task specific, and would not port to other people's systems. They typically have both CLI and web interfaces. For example (screen shots from the HTML versions, except for one text-based one). They all share a super-simple rendering library that can be used for HTML through CGI, and a super simple text rendering library (which can just make text bold/colored and handles unicode). All in python with some C++ mixed in; all the result of many years of tinkering; and all partially incomplete.
 

Attachments

  • Screenshot 2023-02-03 at 2.45.13 PM.png
    Screenshot 2023-02-03 at 2.45.13 PM.png
    181.9 KB · Views: 103
  • Screenshot 2023-02-03 at 2.45.46 PM.png
    Screenshot 2023-02-03 at 2.45.46 PM.png
    244.6 KB · Views: 101
  • Screenshot 2023-02-03 at 2.46.27 PM.png
    Screenshot 2023-02-03 at 2.46.27 PM.png
    120.4 KB · Views: 104
  • Screenshot 2023-02-03 at 2.47.07 PM.png
    Screenshot 2023-02-03 at 2.47.07 PM.png
    195.7 KB · Views: 101
  • Screenshot 2023-02-03 at 2.51.05 PM.png
    Screenshot 2023-02-03 at 2.51.05 PM.png
    280.2 KB · Views: 106
Back then, I think the problems were mainly related to the USB adapters. I had USB adapters for both wifi and providing a second NIC. Now that I have an internal PCI LAN adapter, I've not had any issues. Night and day. Back then, to provide some battery backup, I repurposed my old laptops and netbooks as routers. That worked reasonably well, but that hardware was EOL and I later encountered memory issues.

Today, I am just as paranoid, I keep a coldspare (fully configured workstation / router) laying around in the event my workstation or router dies.

ralphbsz - That's pretty neat and very similar to what I had. It provides a quick dashboard into things so you don't need to dig, but if you want to dig, you can. I've just not had the desire to put effort into it since I've not had issues, and at the same rate dashboards that opnsense or pfsense provide are enticing.
 
I agree that many of these "monitoring" things done by amateurs like me are "closing the bathroom window after the horse has left the barn". Why do I monitor the UPS (uninterruptible power supply)? Because we live in a rural area with lots of power interruptions, and I've had the server crash way too many times due to screwups. Same with showing that all file systems are mounted and none of them are overfull: that was a reaction to a really bad day when I lost some data due to a full filesystem, but hadn't noticed right away because the rest of "the system" (network access, water and power) continued to work fine.

And sadly, it's hard to find canned (pre-made) solutions for the real-world monitoring problems. For example:
  • Monitoring the UPS and automatically shutting down the server when the UPS runs out of battery is easy; I use apcupsd, others like NUT. But what that doesn't give me is the ability to see concisely whether I've had zero, a few, or a crapload of power outages recently, and whether they were 1s, 1m, 1h or 1d long. That's why I have python code that shows the battery percentage, and color-codes recent outage events from the log: If you see a lot of red, or a long time distance between red and green, you know something is very wrong.
  • But even that is not good enough. A few months ago, my server crashed due to a power outage. How can that happen, we have both an automatic generator and a UPS? Well, the generator kicks in about 20 seconds after a power outage. Which means the UPS never uses more battery capacity than what 20s requires. I hadn't noticed that the battery in the UPS was getting older and older, until one day it was so weak (and also swollen and leaking) that it only lasted 19s. Oops. So what I should really do is monitor the battery capacity by seeing how much of it is used in 20 seconds worth of outage. That means during an outage read the capacity every second or so, and see how far it dips, and raise an alarm (e-mail) when that looks bad. That's something the stock daemon doesn't do, so I need to code up yet another daemon for that.
  • The FreeBSD daily/weekly/monthly daemons do a fine job of monitoring file system health. Unfortunately, they also do a fine job of monitoring other stuff. So much so that the typical output is long (daily is typically 70-100 lines), and problems are buried in useless chatter. And it doesn't have an alerting system: It dutifully reports that /var/log is 91% full, and the next day 93%, and then 97%, and then the next day I don't get a report because too many things crash. So I need something where I can see in a compact and convenient form that all file systems are green (or maybe one is yellow or red), and that the automatic backup has run recently. And that's why I now have a little monitoring daemon that both sends me e-mails if those things go out of control, and shows the results in a dashboard.
  • Yes, I know there are system and network monitoring solutions (like nagios), but configuring them takes more work than writing a quick ad-hoc monitoring daemon that does a better job.
  • Finally, for monitoring my specific home supply systems (we are responsible our own water, and have multiple wells, pumps and tanks), there simply isn't a premade solution at a household scale. So I hacked something up, which works surprisingly well (yet is always half broken).
 
ralphbsz, good points, and nice quote by the way, I will need to remember that for use sometime :).

This sounds like me :). Hmm, I will probably proceed then to build some more adhoc monitoring solutions. Those are good points about UPSes, that does seem like pretty important functionality that should probably be baked into those services, if not, available as a plugin. So, I wonder if there are any open issues for requesting that.
 
Back
Top