Does each rack mounted server have an OS?

SirDice · Jul 5, 2017

daBee said:
Do the servers need to be flashed initially with something before these can take control?

That depends. I've worked for companies that installed a basic OS on the server by hand (or remotely via IPMI) and let Ansible or Puppet take care of the rest. And I've worked for companies that had a whole automatic infrastructure where the machine would initially PXE boot to an automated installer. It all depends on how much time you can invest in setting this up. For a handful servers the automatic approach may be a bit too much work to set up with little to gain in the long run (how often would you need to re-install, how long does it take, etc), but for hundreds of servers it may be worthwhile to spend the time and effort to automate everything.

Note that enterprise-grade remote control like IPMI, DRAC and iLO always work, even when the server itself is switched off. You would need to physically unplug the machine for IPMI/DRAC/iLO to stop working.

daBee · Jul 5, 2017

OK, this is exactly what I'm talking about. Reliance upon BIOS and semi-intelligent NICs.

 Power on –> BIOS –> Network Card’s PXE stack –> Network Boot Program (NBP) downloaded using TFTP from server to Client’s RAM –> NBP’s responsibility to perform the next step (a.k.a. 2nd stage boot).

I don't see why there can't be a ruby gem that works in tandem with bash scripts and a master controller database that can direct this type of stuff.

Ideally, a proper solid database and web stack with CDN, something along the lines of a half to full rack, would be the target market. OS images can be public domain for generic DHCP, then fine tuning through SSH. I'm guessing that on a full rack, something like this could be performed over a long liquid lunch.

SirDice · Jul 5, 2017

daBee said:
I don't see why there can't be a ruby gem that works in tandem with bash scripts and a master controller database that can direct this type of stuff.

Because you mentioned Ruby, you may want to look into Puppet (Ruby based) and Puppet's ENC (External Node Classifier). That pretty much does all that for you.

The PXE installer could automatically install/configure a basic OS plus Puppet, then Puppet can take care of installing/configuring everything else based on the information from an ENC. That's not something you can have up and running in 10 minutes, it's going to take a lot of effort building it. But once it's done....

daBee · Jul 5, 2017

OK so over 4800 modules working with it. Sounds like it's the deal. So for a common hearty web stack of db-driven schtuff, what kind of time investment do you think would be needed for something like this?

I don't know why, but designing systems with a 'hot-swap' mentality, is just super sexy. This really starts at $1200/year.

SirDice · Jul 5, 2017

Typical webservers we use at the moment are all dual 6-core Xeons (plus Hyper-Threading gives you a total of 24 logical cores), 64 GB RAM and 2 x 2TB HDs. The DB servers usually have 300+ GB RAM and 2-3 TB of SSD in RAID10. Definitely check SuperMicro, FreeBSD usually runs well on most of their models (with some exceptions) and they're reasonably priced. Never had any major issues with their servers.

The setup I built for a client currently consist of 2 servers running HAProxy (load-balancer), 4-6 webservers and 2 or 3 DB servers. For management we have 2 log servers (slow CPUs but a lot of storage) and 2 control servers (Puppet masters, Zabbix, Poudriere package building, etc). Almost everything is set up with Puppet (took me a couple of weeks to script everything). Thanks to HAProxy I can easily take one or more webservers out of the pool (to update for example) without interfering with the running sites. Rebuilding a webserver is done initially by hand for the OS, then Puppet takes care of the rest. It takes about 30 minutes to rebuild it from scratch.

daBee · Jul 5, 2017

Yeah I've been looking at the Supermicro for just that setup. Interesting you went with SSDs for DB. Speed, but the longevity is my concern there.

Is there a use for flashing SSDs locally on USB (MacOS) from a FreeBSD image that's generic? So ports are installed (bash, ntp, file sharing, pico, etc.), then scripting various changes once online? I guess using stick SSDs would need some kind of interface for that. Ah, maybe manual is the answer using RAID for the OS volume.

SirDice · Jul 5, 2017

daBee said:
Interesting you went with SSDs for DB. Speed, but the longevity is my concern there.

It wasn't my choice. I just use whatever they think they need. But the rational was that they needed a lot of IOPS or else the DB simply can't keep up (it was already tuned as fast as possible). That's also the reason why the DB servers have a ridiculous amount of memory. But these are enterprise grade SSDs, they typically last a lot longer than their consumer counter-parts. And everything is either mirrored (OS) or RAID10 (data). So it's not a big problem if one of the SSDs decides to quit.

Is there a use for flashing SSDs locally on USB (MacOS) from a FreeBSD image that's generic? So ports are installed (bash, ntp, file sharing, pico, etc.), then scripting various changes once online?

I never bothered with it. It takes less than 10 minutes to install FreeBSD from normal installation media. I then configure pkg(8) to use our own repository, install Puppet and let it take care of the rest. So it's only ~15 minutes of actual work, the rest is installed and configured automatically with Puppet.

Because they've only got a handful of servers and I rarely need to rebuild it wasn't worth the effort to automate things further.

ralphbsz · Jul 5, 2017

daBee said:
Yeah I've been looking at the Supermicro for just that setup.

Supermicro is also my favorite hardware vendor (in those cases where I have the freedom to choose hardware).

Interesting you went with SSDs for DB. Speed, but the longevity is my concern there.

It depends. To begin with, SSDs can live for a very long time, if a lot of the workload is reads, and only a small fraction is writes. Such workloads do exist, in particular in database applications. Another thing that modern applications do is to optimize their write traffic to prevent "false write sharing" on SSDs, by doing interesting logging techniques.

And remember that hard disks don't live forever either. Matter-of-fact, modern hard disks are specified to have a maximum data traffic per year; for most models I've seen recently, that specification is 550 TB per year (so a 10TB disk can only be read 55 times per year, or roughly once a week). Note that for SSDs the endurance limit is determined by writes, for spinning rust by total traffic, which makes a huge difference.

And ultimately, if an application simply needs the write speed of SSDs, then endurance becomes secondary. There are customer environments where SSDs need to be replaced every few years. Obviously, this is expensive (service personnel needs to be scheduled, spare parts need to be stockpiled and shipped, support contracts need to be written and paid for), but having a broken computer may be more expensive. Obviously, this only makes sense with some sort of RAID mechanism, and in most cases only if the replacement can be done "hot", on a live system.

Think of it this way: If you build a high-end storage server, the computer itself (the CPU box, with PCIe disk controller and network cards) may cost roughly $10K, depending on how much memory you put in there. The external disk enclosures (say 5 extra rack-mount JBODs with room for 80 disks each) can easily add another $50K. The disks themselves start at $100K (if you use reasonably inexpensive enterprise-grade near line drives), all the way up to quarter million or half a million (if you use flash storage like SSDs). Add to that some software, vendor overhead, and support contracts, and you have a very expensive single system, with most of the cost in the storage disks themselves. But why do people buy these very expensive systems? Because they need them to make money. Usually that means that the data on the server is more valuable than the cost of the server, by a large margin. In the overall scheme of a typical corporation, the cost of IT infrastructure is still a minor factor, even though a high-end server cluster (like the one SirDice described above, or like the storage server I'm talking about) can cost as much as a private residence.

petan · Jul 5, 2017

daBee said:
OK, so there is an approach already. Do the servers need to be flashed initially with something before these can take control? The challenge I see in any application is the very introduction of a "black server" (fresh, right out of the box) to the point of control over the LAN.

No, the only thing you need is to set it to boot from the network (PXE boot) all server hardware supports this. The procedure is basically that the network card does a DHCP request to the network and a properly configured DHCP server responds with a IP and also a location on the network where it can find a bootable media. Most oftenly its a tftp server. From the server it downloads the installation program which is run and the OS gets installed. This is a over simplification but you get the point.

daBee · Jul 6, 2017

ralphbsz said:
Supermicro is also my favorite hardware vendor (in those cases where I have the freedom to choose hardware).<snip>

Longevity was more the issue, as it wasn't long ago that SSDs were much more expensive than they are now. I have never had the opportunity to use a stick SSD, but I would prefer the hot swappable drive-format SSD for OS as well.

daBee · Jul 6, 2017

uchman said:
No, the only thing you need is to set it to boot from the network (PXE boot) all server hardware supports this. The procedure is basically that the network card does a DHCP request to the network and a properly configured DHCP server responds with a IP and also a location on the network where it can find a bootable media. Most oftenly its a tftp server. From the server it downloads the installation program which is run and the OS gets installed. This is a over simplification but you get the point.

Yes, sounds like Netboot. This and Puppet sounds like it's all it needs for an accelerated introduction.

chrcol · Jul 15, 2017

SirDice said:
You mean IPMI, DRAC, iLO and a few more? Those really only allow you to remote control a machine, but if that machine doesn't have an OS you can't do anything besides changing some UEFI/BIOS parameters. The machine still requires an OS to be functional.

yes but you can also install an OS with the bios tools.

Does each rack mounted server have an OS?

SirDice

Administrator

daBee

SirDice

Administrator

daBee

SirDice

Administrator

daBee

SirDice

Administrator

ralphbsz

petan

daBee

daBee

chrcol