Rebasing CoreOS for ephemeral cloud storage

The convenience and economy of cloud storage is indisputable, but cloud storage also presents an I/O performance challenge. For example, applications that rely too heavily on filesystem semantics and/or shared storage generally need to be rearchitected or at least have their performance reassessed when deployed in public cloud platforms.

Some of the most resilient cloud-based architectures out there minimize disk persistence across most of the solution components and try to consume either tightly engineered managed services (for databases, for examples) or persist in a very specific part of the application. This reality is more evident in container-based architectures, despite many methods to cooperate with the host operating system to provide cross-host volume functionality (i.e., volumes)

Like other public cloud vendors, Azure presents an ephemeral disk to all virtual machines. This device is generally /dev/sdb1 in Linux systems, and is mounted either by the Azure Linux agent or cloud-init in /mnt or /mnt/resource. This is an SSD device local to the rack where the VM is running so it is very convenient to use this device for any application that requires non-permanent persistence with higher IOPS. Users of MySQL, PostgreSQL and other servers regularly use this method for, say, batch jobs.

Today, you can roll out Docker containers in Azure via Ubuntu VMs (the azure-cli and walinuxagent components will set it up for you) or via CoreOS. But a seasoned Ubuntu sysadmin will find that simply moving or symlinking /var/lib/docker to /mnt/resource in a CoreOS instance and restarting Docker won’t cut it to run the containers in a higher IOPS disk. This article is designed to help you do that by explaining a few key concepts that are different in CoreOS.

First of all, in CoreOS stable Docker runs containers on btrfs. /dev/sdb1 is normally formatted with ext4, so you’ll need to unmount it (sudo umount /mnt/resource) and reformat it with btrfs (sudo mkfs.btrfs /dev/sdb1). You could also change Docker’s behaviour so it uses ext4, but it requires more systemd intervention.

Once this disk is formatted with btrfs, you need to tell CoreOS it should use it as /var/lib/docker. You accomplish this by creating a unit that runs before docker.service. This unit can be passed as custom data to the azure-cli agent or, if you have SSH access to your CoreOS instance, by dropping /etc/systemd/system/var-lib-docker.mount (file name needs to match the mountpoint) with the following:

Description=Mount ephemeral to /var/lib/docker

After systemd reloads the unit (for example, by issuing a sudo systemctl daemon-reload) the next time you start Docker, this unit should be called and /dev/sdb1 should be mounted in /var/lib/docker. Try it with sudo systemctl start docker. You can also start var-lib-docker.mount independently. Remember, there’s no service in CoreOS and /etc is largely irrelevant thanks to systemd. If you wanted to use ext4, you’d also have to replace the Docker service unit with your own.

This is a simple way to rebase your entire CoreOS Docker service to an ephemeral mount without using volumes nor changing how prebaked containers write to disk (CoreOS describes something similar for EBS) Just extrapolate this to, say, your striped LVM, RAID 0 or RAID10 for higher IOPS and persistence across reboots. And, while not meant for benchmarking, here’s the difference between the out-of-the-box /var/lib/docker vs. the ephemeral-based one:

# In OS disk

--- . ( ) ioping statistics ---
20 requests completed in 19.4 s, 88 iops, 353.0 KiB/s
min/avg/max/mdev = 550 us / 11.3 ms / 36.4 ms / 8.8 ms

# In ephemeral disk

--- . ( ) ioping statistics ---
15 requests completed in 14.5 s, 1.6 k iops, 6.4 MiB/s
min/avg/max/mdev = 532 us / 614 us / 682 us / 38 us