Last week we’ve learned about CoreOS’ components and technologies within the ecosystem. This week, we’re directly diving into the practical part and get our hands dirty. If you read the first post, you already know that CoreOS is an operating system geared towards high-availability. Setting up a cluster of nodes requires some configuration and we’ll guide you through the necessary steps to set up your own CoreOS cluster.
CoreOS Series Overview
This Video is Only Available for Future Students
Sorry, only Future Students can view this video.
Within the CoreOS documentation, you can find various guides to run the operating system in different environments. If you want to run CoreOS on DigitalOcean, Amazon EC2, OpenStack, RackSpace, Google Compute Engine, or just bare metal, just go ahead and check out the available docs. Of course, you can run CoreOS in virtualization environments. Since we’re just getting started with CoreOS, we might break things and therefore we use Vagrant as our tool of choice.
Just to make sure: please break things! It’s the best way to understand how components interact within CoreOS’ system. Learning from try-and-error is highly appreciated :)
CoreOS maintains the coreos-vagrant repository on GitHub which provides a solid basis to get a cluster up and running within minutes. We’ll use this repository to create a local CoreOS cluster. Of course, you can use this tutorial to configure and run your CoreOS cluster on any cloud platform. We decided to use vagrant, because it’s much easier to work locally on your machine when getting started with a new system.
First, make sure you have the requirements installed:
If you don’t have
git installed, download the code from coreos-vagrant repository on GitHub as an archived file (like zip). Unpack the archive and
cd from the command line into the recently unpacked folder.
If you have
git installed, clone the coreos-vagrant repository and
cd into it.
With VirtualBox and Vagrant installed, we’re ready to go.
Using CoreOS is only useful when running at least three machines. Only then you’ll benefit from one of CoreOS’s main goals: high-availability. Running a cluster consisting of 2 machines, CoreOS isn’t able to decide on a leader. Because both machines will submit their vote for a leader and can’t find a decision since there is no majority (50-50).
CoreOS uses etcd to connect machines within the cluster. Additionally, etcd selects a cluster leader automatically. Every machine that is not a leader is a follower and can accept the leader role if the cluster leader fails due to hardware issues or whatever reason. We’ll explain etcd in more detail within the upcoming article.
Obtain an Etcd Discovery Token
To spin up a cluster easily, etcd uses a discovery token. etcd will use an existing cluster to create a new one and optains a cluster token from the exising one to connect machines within the new cluster.
You can use the exising etcd functionality on CoreOS’s etcd cluster. They expose a url to optain a new discovery token from their exising cluster. You need to predefine the size of your new cluster. etcd will use a default value of 3 if you don’t pass a proper cluster size as query parameter when using CoreOS’ discovery service.
You can optain a discovery token with a cluster size of 3 when just using the following url. The token value is the alphanumeric string at the end of the returned url.
Pass the size of your cluster as a query parameter to the endpoint. Use
?size=n and replace
n with your desired cluster size. In this guide, we’ll use a cluster size of 4.
The returned url including the discovery token:
We’re going to use the discovery url within our
#cloud-config. The following section explains the
#cloud-config in more detail.
CoreOS uses the
#cloud-config to configure parameters for services and machines, launch systemd units on system boot. The
coreos-vagrant repository has an exising
user-data.sample file with a predefined
#cloud-config content. The project will recognize a
user-data file within the root directory. That means, you need to either copy the
user-data.sample over to
user-data or just create a new
The content of the
user-data file for this guide:
#cloud-config coreos: etcd2: # generate a new token for each unique cluster # from https://discovery.etcd.io/new?size=n where n = cluster size # discovery url to bootstrap the cluster discovery: https://discovery.etcd.io/638fa2b0a1ff50075e170080046c8649 # multi-region and multi-cloud deployments need to use $public_ipv4 # list of member’s client urls to advertise information to the rest of the cluster advertise-client-urls: http://$public_ipv4:2379 # this address is used to communicate etcd data around the cluster initial-advertise-peer-urls: http://$private_ipv4:2380 # listen on both the official ports and the legacy ports # legacy ports can be omitted if your application doesn't depend on them # url to listen for client traffic listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001 # url to listen for peer traffic listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001 fleet: public-ip: $public_ipv4 flannel: interface: $public_ipv4 units: - name: etcd2.service command: start - name: fleet.service command: start
If you copy the config above, make sure you replace the
<token> with your discovery token value. The
$public_ipv4 variables are substitution variables which will be replaced by vagrant with the actual machine specific values.
coreos-vagrant repository has an existing
config.rb.sample file for further cluster configuration. Actually, we don’t need to copy over the
config.rb and perform further cluster configuration. The only property we’re going to change is the number of cluster machine instances within the cluster. We define the cluster size value within the
If you previously worked with Vagrant, you know the syntax and options within a Vagrantfile and ways to configure machines to your needs. If you’re currently losing your vagrant virginity, take a look at the Vagrantfile docs to get a basic understanding.
The Vagrantfile within the
coreos-vagrant repository is quite complex, that’s why you need at least some fundamentals to understand the details going on with your machines.
However, if you don’t want to mess with options for Vagrantfiles, just go ahead and open the file. We’re just changing two values and afterwards kick off the cluster.
Find and change the following variables within your
$num_instances = 5 $update_channel = "stable"
$num_instances variable define the cluster size. We’re starting 5 etcd instances, even though we defined a cluster size of 4 previously when optaining the etcd discovery token. The extra CoreOS instance will fall back to being a proxy node by default.
CoreOS offers three update channels:
alpha. To be honest, it doesn’t really matter which channel you choose when just spinning up the first cluster. Nevertheless, we stay on save paths and go with the stable version of CoreOS.
Start Your Cluster
We’ve finished the required configuration to get our CoreOS cluster up and running. Using the vagrant default provider VirtualBox, we start the cluster using the
vagrant up command.
The command line output will look the this:
$ vagrant up Bringing machine 'core-01' up with 'virtualbox' provider... Bringing machine 'core-02' up with 'virtualbox' provider... Bringing machine 'core-03' up with 'virtualbox' provider... Bringing machine 'core-04' up with 'virtualbox' provider... Bringing machine 'core-05' up with 'virtualbox' provider... ==> core-01: Importing base box 'coreos-stable'... ==> core-01: Matching MAC address for NAT networking... ==> core-01: Checking if box 'coreos-stable' is up to date... ==> core-01: A newer version of the box 'coreos-stable' is available! You currently ==> core-01: have version '717.3.0'. The latest is version '723.3.0'. Run ==> core-01: `vagrant box update` to update. ==> core-01: Setting the name of the VM: coreos-vagrant_core-01_1438938703047_7521 ==> core-01: Clearing any previously set network interfaces... ==> core-01: Preparing network interfaces based on configuration... core-01: Adapter 1: nat core-01: Adapter 2: hostonly ==> core-01: Forwarding ports... core-01: 22 => 2222 (adapter 1) ==> core-01: Running 'pre-boot' VM customizations... ==> core-01: Booting VM... …
Once all 5 machines within the cluster are created and booted by vagrant, you can check their status:
$ vagrant status Current machine states: core-01 running (virtualbox) core-02 running (virtualbox) core-03 running (virtualbox) core-04 running (virtualbox) core-05 running (virtualbox) This environment represents multiple VMs. The VMs are all listed above with their current state. For more information about a specific VM, run `vagrant status NAME`.
Every machine is
running. Great :)
Let’s check the cluster and machine status within etcd and fleet. You can use
vagrant ssh <machine-name> to ssh into any of the created and booted machines.
etcd is responsible to connect all machines within the cluster. It stores information about the cluster members and automatically selects a leader.
The following command is executed from within a CoreOS system. SSH into one of the machines and execute the commands. Show the list of cluster members with the command
etcd member list and inspect if every machine joined correctly during boot.
$ etcdctl member list bc265403c17a8873: name=01a2ce2426014b6285fc87dc9c2ff8b0 peerURLs=http://172.17.8.101:2380 clientURLs=http://172.17.8.101:2379 bd41d18de4cae191: name=ce70e12e334045469a392d1900a6f0dd peerURLs=http://172.17.8.103:2380 clientURLs=http://172.17.8.103:2379 e14b97603ae78a2d: name=ca89e84c086e4b459fec4d9b458b1e6b peerURLs=http://172.17.8.104:2380 clientURLs=http://172.17.8.104:2379 efd857606dbfcd01: name=c919f394360c4fa78f518f28562af511 peerURLs=http://172.17.8.102:2380 clientURLs=http://172.17.8.102:2379
The list prints 4 cluster members. Remember that we defined a cluster size of 4 machines while obtaining the discovery token. Etcd automatically let’s 4 nodes join the cluster and every additional machine falls back to a proxy node.
Machines Within the Cluster
Since we defined created 5 CoreOS machines, let’s check whether all nodes are booted correctly and are available. Even though our cluster consists of 4 machines, there is 1 proxy node staying in the information loop of etcd. All etcd cluster data is also passed to the proxy node.
fleetctl command line utility to show the list of machines available.
$ fleetctl list-machines --full=true 01a2ce2426014b6285fc87dc9c2ff8b0 172.17.8.101 - 10fb48847b94440dae94054d3b88f44a 172.17.8.105 - c919f394360c4fa78f518f28562af511 172.17.8.102 - ca89e84c086e4b459fec4d9b458b1e6b 172.17.8.104 - ce70e12e334045469a392d1900a6f0dd 172.17.8.103 -
We use the
--full=true option to show the full id of each machine. This way, we can compare the machines within the etcd cluster and machines generally available (including proxy nodes).
Problem: etcd2 Not Running or Machines Missing Within the Cluster
When starting out with CoreOS, etcd and fleet, we directly ran into the issue that etcd couldn’t connect to the cluster or other machines. At first, we didn’t know what to do, because we didn’t understand why this error occurs.
$ fleetctl list-machines Error retrieving list of active machines: googleapi: Error 503: fleet server unable to communicate with etcd
fleet server unable to communicate with etcd
We couldn’t get the machines connected to each other. The first thing we didn’t keep track of: when copying the
user-data.sample over to
user-data, there is a config definition for etcd and etcd2.
Every new CoreOS release ships with etcd2. Verify if you start etcd2 within the
#cloud-config and delete the etcd lines.
This issue can occur due to another reason: there are not enough machines within your cluster. You need at least as many machines within the cluster as defined when obtaining the discovery token. Defining a cluster size of 5 machines requires you to start and connect at least 5 etcd instances to the cluster. Only then is your cluster in healthy state.
This guide shows you how to set up your local CoreOS cluster with the help of vagrant. Don’t hesitate to crash any CoreOS instance or misconfigure the cluster. Make use of the benefits that come with vagrant.
Next week, we’ll dive more into etcd. We’ll have a look at its internal architecture, configuration options, and the role etcd plays within the CoreOS ecosystem.