Building Your Own Cluster

Soooo… I’ve been wanting to poke about with a cluster of sorts for a while. My work has been to build, manage, and deal with large clusters of machines for the past 10+ years. First as the second SRE hired at Twitter within the Mesos/Aurora compute management system, and then as part of Google, building software and systems to build and make large clusters work. But how does one build their own cluster? I mean, I live in San Francisco, and do not have a Hobby Cave to stash a bunch of machines within. Also, if one machine costs $1k, ten of them… well, that’s $10k.

Enter “Cloud Computing”; the answer to all things compute. Right? Well, on some level, you can spin up (and down, remember, turn things off when you’re not using them!) quickly. Using whatever cloud provider’s pricing calculator you want to use, say Google Cloud, let’s toss in 10 nodes of compute, each with 2 virtual CPUs, 16GB of RAM, and 32GB of “disk”. Once we hit the calculate button, we end up with a price tag close to $700 per month. Of course, this is for 24/7 use… dropping this down to say 2 hours per day, we’re down to a much more reasonable $87/month. Not bad. But… you’d better not forget to turn that cluster down when you’re not tinkering. Of course, there are any number of cloud products on offering that will help you manage a cluster of machines, scale and shrink them as necessary. However, most of them are not optimized to save you money, but to optimize your ability to scale and offer a reliable service to your customers.

So, how do we build a “cheap” a cluster? One that fits within the confines of a studio apartment in San Francisco. Does not use too much power, does not generate too much heat, is quiet, because sleep is important, etc.

We’ll start with some basic requirements:

Cheap, yeah… This will not be some CPU/GPU powerhouse
Simple 1Gb/s ethernet will be sufficient
Passive cooling would be ideal, if fans, quiet is paramount
Somewhere between 8 and 16 nodes should be sufficient
Intel/AMD or ARM CPUs, whatever is cheaper and available
4 CPU cores, 8 threads per node would be nice, but we can compromise
8GB RAM minimum, 16GB or 32GB would be better, and 64GB would be awesome
A minimum of 32GB of local SATA SSD, preferably expandable via some standard I/O interface of at least “SATA speed”
Ability to run Linux, preferably stock Ubuntu or similar
Small form factor, the full stack of cluster nodes needs to fit in my (very) small closet space
For an additional whish list, I’d add on things like:
Network bootable
Remote management
Low power, because heat, noise, and electricity costs money

The first set of options that came to my mind was to use some sort of Intel NUC system. They’re small, Intel based, usually have options for plenty of on-board RAM and SSD, as well as usually some sort of ethernet interface. They meet, and strictly exceed many of the minimum requirements. However, most of them are simply too expensive, costing several hundred dollars each on the low end. Even if we look at other, non-NUC mini PCs, many of the cheap options are still more than $100 for a bare-bones box without RAM or SSD. While many of these boxes looked ideal, I could never quite get myself to plunk down $3k+ for a set of 8 nodes.

The second set of options that came to mind was to use some sort of ARM based single board computer (SBC). Say some sort of Raspberry Pi. However, most of these seriously lacked in the RAM department. Finding a small, cheap, ARM based solution with 16GB or more of RAM proved to be similar to finding unobtainium.

The next option was to look into a single large machine running a bunch of VMs. This looked promising at the start. However, the price usually ended up being prohibitive in the end. By the time you end up with a machine that can hold 200GB of RAM, you usually end up in expensive territory. Also, some of the projects I had in mind would not be easily modeled on a single machine with a bunch of VMs. Which pushed me towards finding a bunch of physical hardware as my solution.

Enter the Chromebook. Or more specifically the Chromebox. These are devices that run custom firmware for the sole purpose of running a Chrome web browser. Usually they have minimal local SSD storage, a tiny amount of RAM, and the most sucky CPU possible to make things run barely acceptably well. Usually specs include things like 2 core Celeron CPU, 4GB of RAM, and 16GB-32GB of some very slow SSD. Many come with 1GbE and Wifi. There is a wide (very wide) range of these devices, with the newest ones being much more powerful, but at a similar price point to the Intel NUC style mini-PCs, say $500-$700 each. Too expensive.

However, there are versions of these Chromeboxes that are… to put it politely, not desired anymore. They’re too old and too slow to run current versions of the Chrome browser (or today’s web apps). Many of these boxes are simply discarded and replaced with something more powerful. This is where EBay becomes your friend. Searching for “Chromebox Lot”, you will find a number of sellers looking to unload various sized lots of undesirable Chromeboxes. With some searching, I managed to find a lot of 10 HP Chromebox G2 for less than $135. Score! (Yes, in a few years or months they’ll be even cheaper…) I figured that I’d be able to upgrade them to at least 16GB of RAM, and whatever SATA based M.2 I could scrounge if the on-board option was not good enough. The onboard Celeron 3865U was not much of a powerhouse, but it would do. It had all of the virtualization options necessary, and on paper supported up to 32GB of RAM. The HP Chromebox G2 documentation said it supported up to 16GB of RAM. I figured that 16GB would be good enough, and if 32GB worked, I’d be very happy. An NVMe interface to the SSD would have been a nice thing to have, but SATA should do fine.

Once the lot of 10 Chromeboxes were delivered, I used various sources on the interwebs to figure out how to flash a non-chrome-like firmware to each of these boxes. This basically made each of these into “normal” PC, able to boot a normal Ubuntu Server USB install. Everything worked out of the box without issues. My machines had 4GB of RAM, and 32GB of SSD. The Ubuntu install went fine and worked on the first try. It worked well enough that I went out and bought a bunch of RAM, some 8GB, 16GB, and 32GB sticks, to test and see how much RAM this box would work with. Long story short(er), the G2 worked fine with up to 32GB of DDR4 RAM. With 64GB, the system booted (very slowly), but would only find 32GB. Which is fine, the 32GB sticks were a “bit expensive”. The 32GB (2x16GB) can be found for sub $50, and the 16GB (2x8GB) option for sub $30. So with 10 nodes, that would be another $300-$500 worth of RAM. This pushed the total cost with RAM to somewhere in the $435-$635 for this 10 node cluster. Of course, there will be other costs, power, network, possibly disk.

Looking at networking, the simplest option would be to let these nodes talk via Wifi (which came populated on a Wifi card). No switches, no wires. However, chances are that 10 rather busy nodes chatting on the Wifi would likely be some concern. So tossing a 24 port gigabit unmanaged switch into the mix adds about $100 to the total. Add maybe another $50 in brand new CAT5 patch cables (I’m sure I have a bunch of old ethernet cables hanging around here somewhere). Which brings the total to $585-$785. We’re missing upgraded storage, and… well, power.

So yeah… at this point I’m looking how to power this whole mess. Ideally it would all be powered from a single (or maybe two) “things” that you plug into the wall, and then this thing would fan out to power each node. Turns out these boxes will take USB-C power as well as 19V DC via a barrel jack. I may be able to figure out some sort of DC power supply, or see about obtaining some type of “conference room laptop power octopus thing”. Push comes to shove, a 65W DC power transformer can be had for roughly $10 each in bulk, which would add another $100-$120 to the build, which would bring the total to $685-$905. Note, this is roughly equal in price to the GCP cluster running for a full month. Yes, while similar RAM and SSD sizes, the GCP option will SMOKE this cluster out of the water in performance.

Adding some storage, with some searching, a 256GB M.2 SATA stick can be found for sub $20, making the final total reach $885-$1105. However, I have ideas that may not require the extra storage… more on that later.