[SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")

From: Ian C. Blenke (icblenke@nks.net)
Date: Wed Feb 01 2006 - 12:16:00 EST

Next message: Eben King: "[SLUG] ALSA and Ubuntu"
Previous message: paulf@quillandmouse.com: "[SLUG] Mailing List Policies"
Next in thread: Paul M Foster: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Reply: Paul M Foster: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Reply: steve szmidt: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Reply: steve szmidt: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

steve szmidt wrote:

>Not sure if I get you right, but all you need to do is to build them from the
>same version of the same distribution. The same CD's. You are going to end up
>with the exact same kernel on every single computer. Even if across millions
>of servers. The only thing that will differ is the hardware you have.
>
>

Unfortunately, very few machines are exactly the same. Every batch of
motherboards we order from a given manufacturer's lot (with the same
model number) seem to be basically identical. Each time we put in an
order, however, we get a different batch of motherboards with slightly
different chipset changes and whatnot (cost savings for the
manufacturer). It can be maddening if you're not flexible enough to
handle that kind of rapid change in your hardware supply.

Over time, you build a network of machines that are widely different in
the chipsets, but run on very similar CPU cores (which themselves change
gradually).

To handle this, and the various other archs (everything from Pentiums to
dual-core Xeons), we have standardized on our own "kernel" metapackage
that actually contains "hardware optimized" debian kpkg kernels for the
various known motherboard configurations. The meta-package probes the
hardware and installs the appropriate kernel for that box.

>Then you will have the kernel behavior that differs between Intel/AMD,
>32bit/64bit. But if you are building a real cluster it will be the exact same
>hardware, and if it's only similar hardware like from Intel and AMD, keeping
>your own kernel will not solve the problem inherent in how the CPU's differ.
>
>

Right. We have one unified kernel source tree that I maintain. It has
patches and fixes known to be required for our various hardware
platforms in the farm. I've built a kernel building harness I call
"kerncob" to build the dozens of kernels with the slightly differing
.config files required for each. In the end, they all behave identically
from a userspace perspective - the same debian packages for kernel
modules (like ALSA, openswan, etc), the only difference being that the
correct optimization/configuration kernels are installed on the box
automagically via creative /proc/cpuinfo, lspci, etc.

>Of course if you mean using different versions of or even distributions across
>your servers, that's plain silly. Just adding trouble, or wasted efforts to
>maintain.
>
>

Absolutely. The userspace base image across all machines is identical.
To do otherwise is asking for madness.

>What I think is the far bigger issue is maintenance of the code. Bug and
>security fixes. Who is behind maintaining that for your distro. With, again,
>someone like CentOS it's RH and their very competent crew. But that can be a
>much bigger hassle.
>
>

We maintain our own "metadistro". I don't have to maintain debian
packages (unless the maintainers are on crack, which they sometimes seem
to be). I maintain the meta-packages that apply the required debian
packages and any "fixups" that are needed to bring the systems into some
resemblance of sanity.

>Servers should only have a limited amount of s/w installed and running on
>them. This is of course still relative. There is a much bigger number of
>utilities running these days. Checking quality, connections, etc. But that is
>still a much lower amount than a desktop.
>
>

Absolutely. I'd go insane attempting to maintain all of the packages
that make up a desktop distribution. To that end, our desktops share
from a backport repository that I try to keep stable and recent enough
for desktop use. Servers don't pull directly from this apt-repository,
but the packages that servers use _are_ included in that repository.

To blindly trust the upstream maintainers is asking for pain. Trust me.

>I agree that there's a lot of differences that occurs across distros. The way
>they integrate various packages. But I'd argue that the kernel is probably
>the least of your problems. How a package has been integrated is in my view
>the problem we see. The version that distro used and configured it to work
>for them. There's a massive amount of variables. (Maybe that's what you
>meant?)
>
>
Yes. Massive amount of variables. And each variable is dependant on that
package maintainer's view of How Things Should Be. At NKS, we have our
own idea about that, and try to make metapackages that are centrally
maintainable yet apply at the edge in the same reproducable way everytime.

I stress the concept of "little switches" and "big switches" here. A
"little switch" is a configuration option that an OpenSource package
sees inside it's configuration file. A "big switch" is a configuration
switch that we set for our metapackage to model the machine's
configuration state. The metapackages on every box turn the "big
switches" into "little switches" for the packages that they envelop. By
developing in this way, we can add "big switches" as necessary to enable
features for our customers, yet hide the complexity of the "little
switches" inside each metapackage. Each metapackage begins with a
"template" configuration, and proceeds to set the many little switches
as appropriate for each big switch setting. This is enforced every time
from the central configuration repository, and will override any local
configuration changes (debugging and troubleshooting should happen at
the edge, _not_ configuration).

Any machine in our farm can die at any time, and we can redeploy a base
imaged machine purposed as that machine within minutes. Every system's
configuration state is maintained in our central configuration
management structure. We have historical dirvish backups of everything
should we need to restore data (if it could not be restored from the
dying/dead machine).

So, yes, we maintain our own distribution. It's not debian per-se, but a
meta-package distribution _consisting_ of debian packages. Sure, it
looks like a debian machine for all intents and purposes, but our
Configuration Management structure is a scaffolding integrated into the
fabric of every system. You would not apt-get from an external
repository, for example.. every metapackage has its own mini-apt
repository of the packages that comprise it.

I'll never go back to managing systems individually. I'll also never go
back to blindly trusting any public repository. If you want to be in
complete control of the state of every machine in your farm, you _must_
use some form of configuration management. Ours is homegrown, but very
similar to ISConf in a number of ways. I'm also actively looking at
Puppet as a potential migration path.

In my opinion, you _must_ use some form of Configuration Management for
more than a few machines.

Anything less is chaos.

For more information on this kind of thing, visit Steve Traugott's
Infrastructures.org or Luke Kanies' Puppet:

http://infrastructures.org
http://reductivelabs.com

- Ian C. Blenke <icblenke@nks.net> http://ian.blenke.com/

-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.

Next message: Eben King: "[SLUG] ALSA and Ubuntu"
Previous message: paulf@quillandmouse.com: "[SLUG] Mailing List Policies"
Next in thread: Paul M Foster: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Reply: Paul M Foster: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Reply: steve szmidt: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Reply: steve szmidt: "Re: [SLUG] Configuration Management and Linux Distributions in Large Scale Farms (was "hardware dependencies?")"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 18:04:38 EDT