Your worst day as a FreeBSD user/administrator

... aka: user/sysadmin nightmares.

There are days when I wake up determined in making a really huge/dangerous change in my IT life. Some new/strange/crazy idea pop up in my brain during REM stage of sleep. When I wake up, I have to do that.

Usually, when a day starts in this way, by the end of the day the only things that remains are rubble and debris. And a few tarballs of backups (I'm destructive, but not stupid :e )

At home these days are easily identifiable by my girlfriend shaking her head in disapproval. At work, I am clever (lucky?) enough to destroy & fix all before evening.
The good news is that I usually destroy damage gently touch only test machines, not production ones. The bad news are that despite the fact that these are "test" machines, I need them, or simply I don't have enough time to fix them "in case something goes wrong"...

A few monts ago I've had one of these "enlightened" days.

Following some of the great HOWTOs in the forum, I created a full-encrypted geli system, in the usual way: unencrypted boot partition, and everything else in a geli encrypted disk.

Works flawlessly.

I've spent weeks to configure this machine, it was nearly perfect. It was even a configuration testbed for my home machine - which is sacred!

But that day, I woke up with one objective: upgrade the system from 8.2-RELEASE to 9-STABLE. Easy.

Download sources... ok.
make buildworld/kernel/installworld... done
Reboot... done.

Then the nightmare begins.
Nearly every command resulted in a core dump - even the shell! What's happened? Stay calm and investigate:
A broken geli provider? No.
Some radical change in system libraries? /usr/src/UPDATING shows nothing unusual.
A corrupted filesystem? all is ok.

Panic. And under panic, humans have an abnormal behaviour...

Ok, let's recompile ALL ports...

From bad to worse, it was a hurricane of core dumps.
At that point, the machine was totally FUBAR. I created a new virtual machine, geli encrypted, reinstalled (this time from a 9.0-RELEASE image), restored configuration backups (restoring dump of a 8.2-RELEASE onto a 9.0-RELEASE did not seem like a good idea...) and finally started the upgrade procedure to 9-STABLE (I'm obstinate).

make buildworld
Wait...

make kernel KERNCONF=MYCONF KODIR=/boot/testing
Wait... WAIT!!! /boot... oh my...!!!

Here was the problem! In the first machine I forgot to link the (encrypted) /boot directory to the (clear) /boot partition! So in practice I was using a 9-STABLE system with a 8.2-RELEASE kernel!

A silly oversight has nuclearized my beloved machine (well, to be honest the "Little Boy" of the situation it was me)

In the end, all went fine, I've lost some hair and an entire day recreating something that the previous evening already existed, but at least I have learned something!

And the FreeBSD virtual machine still runs like a charm (now 9.1-STABLE).
 
The day I had many files to go through and had rigged up an XFCE action to rm * $1, of course I was tired, I had a bunch of thing's selected and my shifting hand slipped and hit the wrong thing in the menu. Suddenly the hard drive was awfully busy. "Hey where did that stuff go!?". It didn't take me long to realize it was gone. So I shut down the computer. Took the drive out and put it on the shelf, because I had to leave and didn't have enough time to work on it further.

So one week later, I had a copy of UFS Explorer and the drive in an enclosure. I am happy to say I got the vast majority of my files back in one piece. The lesson learned, don't bypass fail-safe's.:)
 
The day capybara-webkit ruby gem was not working AGAIN.
Finally I gave up, and asked boss to get me a Mac (at work). ;/
 
Worst one was as a user, still rather wet behind the ears when it came to FreeBSD or UNIX in general. Did a # rm -rf * as root thinking I was in /usr/ports. I wanted to nuke my ports tree because I thought it was corrupted.

Hmmm... Why is it taking so long? The directory isn't that big. Better stop it. Unfortunately I wasn't in /usr/ports/, it was /usr/. So, I nuked half the system, including my own files in /usr/home x(

Needless to say I have since changed the prompt to include the current directory and I've stopped using rm -rf *. I now always add the directory: rm -rf /usr/ports/* for example. The -f makes rm(1) a very dangerous tool!
 
The day I stayed up all night configuring a replacement home server for my mail/web/storage server. I decided to configure it using full root ZFS on top of GELI with a thumb drive to store the GELI key file. Mind you, this was the first time doing such a setup.

The setup was perfect, worked good and I had finally copied over all data/configuration. Total configuration time was right about 12 hours (included ports build time).

Then came the final reboot and the system did not boot. Could not find kernel, /boot directory on thumb drive was empty. Then it dawned on me. I had reformatted the thumb drive to contain a small partition besides the /boot one and had forgotten to copy over the kernel and GELI key file. Also had forgotten to copy over the GELI key file to my desktop for backup.

I had a perfect and secure system, too bad I could not get access to it.

Live and learn.
 
Trying to port software that has no FreeBSD support and the developers are unaware of other systems. I've tried this with the Sugar desktop and am trying this with Creative Labs.There are times when the hardware does not work with the software- E.g. VirtualBox on an 800x600 screen. Often the solution comes from chaotic order or it stalemates.
 
Got a call at 02:00, none of the printers were printing. The FreeBSD print server was down. No remote access, had to go to it and restart. The plan was to fix the problem the next day. However, the users still couldn't print, because every printer said it had the wrong paper size. Apparently when none of the printers would print, they just assumed that changing all of the trays to random paper sizes might fix it. Don't try to understand that last part, they couldn't explain any sort of thinking behind it.
 
As a user, I accidentally deleted Xorg when trying to remove Xfce when I just switched to FreeBSD from Debian.
Now, I just delete software using pkg_rmleaves to avoid this mistake.
 
This isn't FreeBSD specific, but when I was a noob Unix admin with enough knowledge to be dangerous (and a root shell :D)... working at a small ISP. On a production box (running most services back then, including SMTP/POP and HTTP...

I had a bunch of "dotfiles" owned by root in my home directory (caused by running commands like screen or whatever as root). I'd stopped running those things as root, but the dotfiles could not be written to by my account.

I'll fix this!

chown -R jrose .*

Was on a little Sun Netra, running Solaris 2.6. The command didn't come back to the prompt for a couple of seconds and then the realization hit.

.* includes ..

Recursively.

The command was working its way out of ~ and up into /home. On its way up to / eventually.

Luckily I killed it before it got too far. It had merely changed home directory ownership of 250 user accounts :)



Another bad day was editing /etc/password in vi for some reason (same ISP, same production system actually).

A shortcut to save and exit in vi is
: x

Unfortunately I did
:X

And then it didn't quit, so I then did
: q


What does this do you ask? It is/was a solaris/sun(?) version of vi specific command for "save encrypted file" (or similar). I believe using crypt (memory fuzzy, it's been a long time).

Restore password file from last backup.... luckily the uber-admin mentor had 6-hourly backups.

To this day, I will only ever use :wq :)



Extensive popularity of beadm in future, and regular ZFS snapshots can only be a good thing... :)
 
Back in the dotcom days of 2000, I was working at an online greeting card company. We used a NetApp filer for backend storage for all of the cards. I was working on cleaning a directory structure up that didn't have any cards in it. Unfortunately, I was in the wrong directory and a simple # rm -rf * resulted in me deleting around 200,000 cards. I remember the cold sweat that I broke out in as I reached for the phone and called my boss. He was amazingly cool about it, even though we didn't have any backups.
 
My worst moment came when swapping out an old hard drive for a new one on my home machine. I forget the exact steps I had taken (I was tired at the time), but the key moment came when I ran something like rm -R /mnt/disk/home/* /home instead of a line like cp -R /home/disk/home/* /home.

So instead of copying all the user files from my old drive to the new computer, I ended up erasing all of my files. As it turned out, my last backup was corrupted and so I ended up losing pretty much everything I'd done for the past month. Luckily I had other backups and could piece together a lot of the rest, but I felt pretty stupid nonetheless.
 
It wasn't that long ago (2007/8?) I had an old /personal/ FreeBSD server I wanted to do HW upgrade on. I didn't have spare disks to swap my data to (<2TB at that time). I found one Debian server with enough free space to do that. I copied data to it directly from those disks to speed things up (FreeBSD disks were connected directly to Debian in ro mode). I built new server, prepped disks and was ready to migrate data back to FreeBSD.

As I was satisfied with the speed it took to copy data from FreeBSD to Debian, I decided to use this method again, now vice versa. FreeBSD was able to read Linux filesystem (I think it was reiser, but I wouldn't bet money on it). Copying went smoothly, everything seemed to be hunky-dory.

Very disappointing moment came when I wanted to use the data I copied. Each and every file was corrupted, full of BS data. The only thing which was correct was the directory tree and name of the files it copied.

Since then I _always do at least cksum, or md5 on any file being transferred to my server. And I don't trust any foreign filesystem.

__

And one typical newbie mistake I think lot of us went through. Back in 2002 I was a unix world newbie starting to learn on Slackware (9.0 I think) with Windows XP as a primary OS. All data (mp3s and some pictures mostly) were on the same disk I tried to install Slack on. It was that time I learnt the valuable lesson on backups .. sort of use it or loose it. Yes, I made a mistake and lost the partition table. At that time I didn't do any MBR backup, so I lost the data.
 
When I was getting my Private pilot license in Miami back in 1996 they used to say that a pilot who has never landed with the gears of is a pilot who will.

So, I guess that an administrator who has never done an:

rm -rf *

in the wrong directory, is an inexperienced administrator.

I have never experienced landing without gears, although it has been a lot of time. But I have nuked an entire root fs!
 
gkontos said:
When I was getting my Private pilot license in Miami back in 1996 they used to say that a pilot who has never landed with the gears of is a pilot who will.

Nice. Greetings from PPL(A) student :) I'll keep that it mind :) Thought I don't need to worry about that on PS-28 Cruiser ;-)
 
matoatlantis said:
Nice. Greetings from PPL(A) student :) I'll keep that it mind :) Thought I don't need to worry about that on PS-28 Cruiser ;-)

Some of the advantages of flying fixed gears!
 
When first learing FreeBSD I setup:
Code:
rm: aliased to echo " TMP? TEMP? " && "/bin/rm -i $* "
I was new to tmp rather than "temp" on the C drive... ... the remainder removes rm
functionality unless one types the full path, usually enough effort required to
allow one to reread one's command twice at least, which nearly almost always
halts typos from continuing through to the ENTER key...
 
gkontos said:
When I was getting my Private pilot license in Miami back in 1996 they used to say that a pilot who has never landed with the gears of is a pilot who will.

So, I guess that an administrator who has never done an:

rm -rf *

in the wrong directory, is an inexperienced administrator.

PPL Saying hi to another. Lucky for me, no gear up landing yet either, but a few rm -rf /*. Guess that makes me a heavily experienced administrator ;)
 
There was a day when I deleted all ports from one of the FreeBSD boxen at work and started rebuilding ports from scratch. Unfortunately, there were problems with the ports tree that day and I wound up having to find a backup with an earlier set of packages I had built to get the machine running properly again. That's when I learned how to build packages from ports in a jail.

It sucks when you have work to do and the world isn't cooperating.

Note that this wasn't my worst computing day ever: see this older thread for that.
 
Oh well, I might as well go ahead and join the hall of shame I suppose ;)

Now, this isn't FreeBSD related, but since the whole thing involves around IPF which is also used in FreeBSD I suppose it could fit in. I know this forum is about being offtopic but I don't want to overdo it, my story is actually about Sun Solaris.

I run my own company, which normally consists of just me, but I often also get help from some of my direct surroundings. It can be something small, like my girlfriend volunteering to monitor the phone (and play Skyrim on the PS3 ;)) when I'm very busy with a project or someone offering me lift.

Recently I've been doing this fulltime, but before that I ran it besides a daytime job, always making sure my employer knew about it and needless to say: always keeping my priorities straight (back then it only involved website hosting, nowadays I do a lot more).

My story involves around a rather big customer who was doing some sort of auction on his website and asked me if I could monitor the thing. The problem was that the site sometimes stalled a bit, which had everything to do with the way it was build and hardly maintained. Back then I didn't do website development at all, only hosting, so the best I could do was provide an optimized environment.

Because this was quite big I ended up turning to Amazon's EC2 to provision a few "backup servers" and I hired a PHP programmer / systems administrator who would be able to intervene if necessary. The problem was that the website was one big mess of code, and it would be way too expensive to rebuild the site from the ground up. But this guy said that he should be able to identify problem spots and then could look into how (and if) he could fix them.

So the idea was that he gained access to the Solaris server and monitored the Apache server (and some other components), I had 3 backup servers standing by in the Amazon cloud and already setup something like a cluster which would make sure the site remained active while the load would be spread (very long story, it involved more than your average round-robin in a DNS server).


Now, the problem was that this website got pestered by some "kiddiots" (as I like to call them, the combination of your average script kiddie and someone who hardly has a clue what he's doing) which almost every time used IP ranges which originated from Romania.

Romania has several IP ranges, but some of them overlap a bit with ranges which are also used in The Netherlands. Obviously I'm not going into detail here, it's bad enough as it is, but let's just say that at least the first digit overlapped (it's been a while, I think it was actually the first two, but alas).

So on a Saturday evening the auction started and as expected attracted quite some visitors. The programmer and me we're both connected using MSN Messenger and we we're both busy monitoring the server. He was on the look out for "trouble spots" whereas I was focussing my attention on abuse.

And then I suddenly started noticing weird connection attempts. Completely convinced it were those nasty Romanian "kiddiots" I applied changes to the firewall, reloaded it and from that moment on very effectively had locked out both the programmer and myself from the main server :)

SO here I was now paying someone who couldn't do anything for the whole evening. Brilliant business strategy right there :)

Fortunately for me I do have a feeling for the kind of people I hire, and he didn't abuse the situation in one bit. In fact; he even participated on some of the auctions themselves merely to check if he could spot issues that way (also went through some of the source code). Everything went well, the server didn't collapse and we didn't have to resort to using backups. Fortunately for me it also didn't resort in spending a lot of money for nothing.

But even so; I kinda learned that it's best not to be too hasty when trying to lockout "bad guys" from the server without carefully double checking what is going on.
 
gkontos said:
When I was getting my Private pilot license in Miami back in 1996 they used to say that a pilot who has never landed with the gears of is a pilot who will.

Each landing you can walk away from on your own is a good one.

And to add to the read mail episodes (the read mail command, together with the real fast switch), here is something that happend.

It's the day before release, you want to package all together and ship it. First check in the lot. Before that, clean up the thing, there is tons of editor backups around, somefile.c~. So you construct a command line to find and delete all that ends in a ~, something like "rm *~". And for those who use german keyboard mappings, just check where the ~ is. For those who do not know that layout, the key is on the right edge of the keyboard, right next to a somewhat larger key - and so desaster smiles, and strikes.
 
gkontos said:
So, I guess that an administrator who has never done an:

rm -rf *

in the wrong directory, is an inexperienced administrator.

After 10 years of working with UNIX I actually have never done this, so maybe I'm not trying hard enough :e However I indirectly did, so I guess it counts.

I was teaching AIX to one of the uni grads at work and I told him to remove a user simply by.

rm -r ~user
rmuser -p user

Except one day he did:

rm -r ~ user

Unfortunately to this day IBM still has not changed root's home directory to /root like every other UNIX system.

I got blamed anyway since I wasn't around to defend myself and inherited a nice nickname for teaching the grads bad habits.
 
Blueprint said:
Unfortunately to this day IBM still has not changed root's home directory to /root like every other UNIX system.

Actually, HP-UX also sets home to / for root by default. We, per our company standard, put home to different location (not /root either) to avoid any problems. As we are using golden images (recovery images) by default we don't need to pay attention to it that often (usually only when vanilla install is done on new release where no golden image exists yet and OS upgrade was declined for some reason).

Anyway, that had to hurt when you saw that command being executed. :/

I remembered one situation when my former colleague from SAP team was working on something and he had to rename oracle bdf files. Don't ask me why he did that manually, I'm no oracle/SAP expert. He was working graveyard shift (lower support); he executed the command (file names are fictional):

# rm prod01.bdf prod01-backup.bdf
Code:
rm: prod01-backup.bdf non-existent

It took him a while to realize what he did. This was a big fiasko as there was a problem with a backup that night and recovery had to be done from few days earlier. Unfortunately this did cause him a job in the end (it would be long story to tell the full story).
 
Back
Top