Last week we’ve learned about CoreOS’ components and technologies within the ecosystem. This week, we’re directly diving into the practical part and get our hands dirty. If you read the first post, you already know that CoreOS is an operating system geared towards high-availability. Setting up a cluster of nodes requires some configuration and we’ll guide you through the necessary steps to set up your own CoreOS cluster.
CoreOS Series Overview
This Video is Only Available for Future Students
Sorry, only Future Students can view this video.
Enroll to receive exclusive content or sign in if you’re already a Future Student.
Introduction
Within the CoreOS documentation, you can find various guides to run the operating system in different environments. If you want to run CoreOS on DigitalOcean, Amazon EC2, OpenStack, RackSpace, Google Compute Engine, or just bare metal, just go ahead and check out the available docs. Of course, you can run CoreOS in virtualization environments. Since we’re just getting started with CoreOS, we might break things and therefore we use Vagrant as our tool of choice.
Just to make sure: please break things! It’s the best way to understand how components interact within CoreOS’ system. Learning from try-and-error is highly appreciated :)
Preparation
CoreOS maintains the coreos-vagrant repository on GitHub which provides a solid basis to get a cluster up and running within minutes. We’ll use this repository to create a local CoreOS cluster. Of course, you can use this tutorial to configure and run your CoreOS cluster on any cloud platform. We decided to use vagrant, because it’s much easier to work locally on your machine when getting started with a new system.
First, make sure you have the requirements installed:
If you don’t have git
installed, download the code from coreos-vagrant repository on GitHub as an archived file (like zip). Unpack the archive and cd
from the command line into the recently unpacked folder.
If you have git
installed, clone the coreos-vagrant repository and cd
into it.
With VirtualBox and Vagrant installed, we’re ready to go.
Configuration
Using CoreOS is only useful when running at least three machines. Only then you’ll benefit from one of CoreOS’s main goals: high-availability. Running a cluster consisting of 2 machines, CoreOS isn’t able to decide on a leader. Because both machines will submit their vote for a leader and can’t find a decision since there is no majority (50-50).
CoreOS uses etcd to connect machines within the cluster. Additionally, etcd selects a cluster leader automatically. Every machine that is not a leader is a follower and can accept the leader role if the cluster leader fails due to hardware issues or whatever reason. We’ll explain etcd in more detail within the upcoming article.
Obtain an Etcd Discovery Token
To spin up a cluster easily, etcd uses a discovery token. etcd will use an existing cluster to create a new one and optains a cluster token from the exising one to connect machines within the new cluster.
You can use the exising etcd functionality on CoreOS’s etcd cluster. They expose a url to optain a new discovery token from their exising cluster. You need to predefine the size of your new cluster. etcd will use a default value of 3 if you don’t pass a proper cluster size as query parameter when using CoreOS’ discovery service.
You can optain a discovery token with a cluster size of 3 when just using the following url. The token value is the alphanumeric string at the end of the returned url.
https://discovery.etcd.io/new
Pass the size of your cluster as a query parameter to the endpoint. Use ?size=n
and replace n
with your desired cluster size. In this guide, we’ll use a cluster size of 4.
https://discovery.etcd.io/new?size=4
The returned url including the discovery token:
https://discovery.etcd.io/638fa2b0a1ff50075e170080046c8649
We’re going to use the discovery url within our #cloud-config
. The following section explains the #cloud-config
in more detail.
Cloud-Config
CoreOS uses the #cloud-config
to configure parameters for services and machines, launch systemd units on system boot. The coreos-vagrant
repository has an exising user-data.sample
file with a predefined #cloud-config
content. The project will recognize a user-data
file within the root directory. That means, you need to either copy the user-data.sample
over to user-data
or just create a new user-data
file.
The content of the user-data
file for this guide:
#cloud-config
coreos:
etcd2:
# generate a new token for each unique cluster
# from https://discovery.etcd.io/new?size=n where n = cluster size
# discovery url to bootstrap the cluster
discovery: https://discovery.etcd.io/638fa2b0a1ff50075e170080046c8649
# multi-region and multi-cloud deployments need to use $public_ipv4
# list of member’s client urls to advertise information to the rest of the cluster
advertise-client-urls: http://$public_ipv4:2379
# this address is used to communicate etcd data around the cluster
initial-advertise-peer-urls: http://$private_ipv4:2380
# listen on both the official ports and the legacy ports
# legacy ports can be omitted if your application doesn't depend on them
# url to listen for client traffic
listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
# url to listen for peer traffic
listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001
fleet:
public-ip: $public_ipv4
flannel:
interface: $public_ipv4
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
If you copy the config above, make sure you replace the <token>
with your discovery token value. The $private_ipv4
and $public_ipv4
variables are substitution variables which will be replaced by vagrant with the actual machine specific values.
config.rb
The basic coreos-vagrant
repository has an existing config.rb.sample
file for further cluster configuration. Actually, we don’t need to copy over the config.rb.sample
to config.rb
and perform further cluster configuration. The only property we’re going to change is the number of cluster machine instances within the cluster. We define the cluster size value within the Vagrantfile
.
Vagrantfile
If you previously worked with Vagrant, you know the syntax and options within a Vagrantfile and ways to configure machines to your needs. If you’re currently losing your vagrant virginity, take a look at the Vagrantfile docs to get a basic understanding.
The Vagrantfile within the coreos-vagrant
repository is quite complex, that’s why you need at least some fundamentals to understand the details going on with your machines.
However, if you don’t want to mess with options for Vagrantfiles, just go ahead and open the file. We’re just changing two values and afterwards kick off the cluster.
Find and change the following variables within your Vagrantfile
:
$num_instances = 5
$update_channel = "stable"
The $num_instances
variable define the cluster size. We’re starting 5 etcd instances, even though we defined a cluster size of 4 previously when optaining the etcd discovery token. The extra CoreOS instance will fall back to being a proxy node by default.
CoreOS offers three update channels: stable
, beta
, alpha
. To be honest, it doesn’t really matter which channel you choose when just spinning up the first cluster. Nevertheless, we stay on save paths and go with the stable version of CoreOS.
Start Your Cluster
We’ve finished the required configuration to get our CoreOS cluster up and running. Using the vagrant default provider VirtualBox, we start the cluster using the vagrant up
command.
The command line output will look the this:
$ vagrant up
Bringing machine 'core-01' up with 'virtualbox' provider...
Bringing machine 'core-02' up with 'virtualbox' provider...
Bringing machine 'core-03' up with 'virtualbox' provider...
Bringing machine 'core-04' up with 'virtualbox' provider...
Bringing machine 'core-05' up with 'virtualbox' provider...
==> core-01: Importing base box 'coreos-stable'...
==> core-01: Matching MAC address for NAT networking...
==> core-01: Checking if box 'coreos-stable' is up to date...
==> core-01: A newer version of the box 'coreos-stable' is available! You currently
==> core-01: have version '717.3.0'. The latest is version '723.3.0'. Run
==> core-01: `vagrant box update` to update.
==> core-01: Setting the name of the VM: coreos-vagrant_core-01_1438938703047_7521
==> core-01: Clearing any previously set network interfaces...
==> core-01: Preparing network interfaces based on configuration...
core-01: Adapter 1: nat
core-01: Adapter 2: hostonly
==> core-01: Forwarding ports...
core-01: 22 => 2222 (adapter 1)
==> core-01: Running 'pre-boot' VM customizations...
==> core-01: Booting VM...
…
Once all 5 machines within the cluster are created and booted by vagrant, you can check their status:
$ vagrant status
Current machine states:
core-01 running (virtualbox)
core-02 running (virtualbox)
core-03 running (virtualbox)
core-04 running (virtualbox)
core-05 running (virtualbox)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
Every machine is running
. Great :)
Let’s check the cluster and machine status within etcd and fleet. You can use vagrant ssh <machine-name>
to ssh into any of the created and booted machines.
Cluster Members
etcd is responsible to connect all machines within the cluster. It stores information about the cluster members and automatically selects a leader.
The following command is executed from within a CoreOS system. SSH into one of the machines and execute the commands. Show the list of cluster members with the command etcd member list
and inspect if every machine joined correctly during boot.
$ etcdctl member list
bc265403c17a8873: name=01a2ce2426014b6285fc87dc9c2ff8b0 peerURLs=http://172.17.8.101:2380 clientURLs=http://172.17.8.101:2379
bd41d18de4cae191: name=ce70e12e334045469a392d1900a6f0dd peerURLs=http://172.17.8.103:2380 clientURLs=http://172.17.8.103:2379
e14b97603ae78a2d: name=ca89e84c086e4b459fec4d9b458b1e6b peerURLs=http://172.17.8.104:2380 clientURLs=http://172.17.8.104:2379
efd857606dbfcd01: name=c919f394360c4fa78f518f28562af511 peerURLs=http://172.17.8.102:2380 clientURLs=http://172.17.8.102:2379
The list prints 4 cluster members. Remember that we defined a cluster size of 4 machines while obtaining the discovery token. Etcd automatically let’s 4 nodes join the cluster and every additional machine falls back to a proxy node.
Machines Within the Cluster
Since we defined created 5 CoreOS machines, let’s check whether all nodes are booted correctly and are available. Even though our cluster consists of 4 machines, there is 1 proxy node staying in the information loop of etcd. All etcd cluster data is also passed to the proxy node.
Use the fleetctl
command line utility to show the list of machines available.
$ fleetctl list-machines --full=true
01a2ce2426014b6285fc87dc9c2ff8b0 172.17.8.101 -
10fb48847b94440dae94054d3b88f44a 172.17.8.105 -
c919f394360c4fa78f518f28562af511 172.17.8.102 -
ca89e84c086e4b459fec4d9b458b1e6b 172.17.8.104 -
ce70e12e334045469a392d1900a6f0dd 172.17.8.103 -
We use the --full=true
option to show the full id of each machine. This way, we can compare the machines within the etcd cluster and machines generally available (including proxy nodes).
Problem: etcd2 Not Running or Machines Missing Within the Cluster
When starting out with CoreOS, etcd and fleet, we directly ran into the issue that etcd couldn’t connect to the cluster or other machines. At first, we didn’t know what to do, because we didn’t understand why this error occurs.
$ fleetctl list-machines
Error retrieving list of active machines: googleapi: Error 503: fleet server unable to communicate with etcd
fleet server unable to communicate with etcd
We couldn’t get the machines connected to each other. The first thing we didn’t keep track of: when copying the user-data.sample
over to user-data
, there is a config definition for etcd and etcd2.
Every new CoreOS release ships with etcd2. Verify if you start etcd2 within the #cloud-config
and delete the etcd lines.
This issue can occur due to another reason: there are not enough machines within your cluster. You need at least as many machines within the cluster as defined when obtaining the discovery token. Defining a cluster size of 5 machines requires you to start and connect at least 5 etcd instances to the cluster. Only then is your cluster in healthy state.
Outlook
This guide shows you how to set up your local CoreOS cluster with the help of vagrant. Don’t hesitate to crash any CoreOS instance or misconfigure the cluster. Make use of the benefits that come with vagrant.
Next week, we’ll dive more into etcd. We’ll have a look at its internal architecture, configuration options, and the role etcd plays within the CoreOS ecosystem.
Additional Resources
- CoreOS proxy node
- coreos-vagrant repository on GitHub