Setting up Consul as service registry inside a Swarm cluster

As before, we'll start by setting up a Swarm cluster. From there on, we'll proceed with the Consul setup and a quick overview of the basic operations we can do with it. That will give us the knowledge necessary for the rest of this chapter.

A note to The DevOps 2.0 Toolkit readers
You might be tempted to skip this sub-chapter since you already learned how to set up Consul. I recommend you read on. We'll use the official Consul image that was not available at the time I wrote the previous book. At the same time, I promise to keep this sub-chapter as brief as possible without confusing the new readers too much.

Practice makes perfect, but there is a limit after which there is no reason to repeat the same commands over and over. I'm sure that, by now, you got tired of writing the commands that create a Swarm cluster. So, I prepared the scripts/dm-swarm.sh (https://github.com/vfarcic/cloud-provisioning/blob/master/scripts/dm-swarm.sh) script that will create Docker Machine nodes and join them into a Swarm cluster.

All the commands from this chapter are available in the 04-service-discovery.sh ( https://gist.github.com/vfarcic/fa57e88faf09651c9a7e9e46c8950ef5) Gist.

Let's clone the code and run the script:

Some of the files will be shared between the host file system and Docker Machines we'll create soon. Docker Machine makes the whole directory that belongs to the current user available inside the VM. Therefore, please make sure that the code is cloned inside one of the user's sub-folders.
git clone https://github.com/vfarcic/cloud-provisioning.git

cd cloud-provisioning

scripts/dm-swarm.sh

eval $(docker-machine env swarm-1)

docker node ls

The output of the node ls command is as follows (IDs are removed for brevity):

HOSTNAME STATUS AVAILABILITY MANAGER STATUS
swarm-2 Ready Active Reachable
swarm-3 Ready Active Reachable
swarm-1 Ready Active Leader

Please note that this time there was a slight change in the commands. We used the manager token so that all three nodes are set up as managers.

As a general rule, we should have a least three Swarm managers. That way, if one of them fails, the others will reschedule the failed containers and can be used as our access points to the system. As is often the case with solutions that require a quorum, an odd number is usually the best. Hence, we have three.
You might be tempted to run all nodes as managers. I advise you against that. Managers synchronize data between themselves. The more manager instances are running, the more time the synchronization might last. While that is not even noticeable when there are only a few, if, for example, you'd run a hundred managers there would be some lag. After all, that's why we have workers. Managers are our entry points to the system and coordinators of the tasks, while workers do the actual work.

With that out of the way, we can proceed and set up Consul.

We'll start by downloading the docker-compose.yml (https://github.com/vfarcic/docker-flow-proxy/blob/master/docker-compose.yml) file from the Docker Flow Proxy (https://github.com/vfarcic/docker-flow-proxy) project. It already contains Consul defined as Compose services.

curl -o docker-compose-proxy.yml \
https://raw.githubusercontent.com/\
vfarcic/docker-flow-proxy/master/docker-compose.yml

cat docker-compose-proxy.yml

Just as Docker Swarm node can act as a manager or a worker, Consul can be run as a server or an agent. We'll start with the server.

The Compose definition of the Consul service that acts as a server is as follows:

consul-server:
container_name: consul
image: consul
network_mode: host
environment:
- 'CONSUL_LOCAL_CONFIG={"skip_leave_on_interrupt": true}'
command: agent -server -bind=$DOCKER_IP -bootstrap-expect=1 -client=$DOCKER_IP

The important thing to note is that we set up the network mode as host. That means that the container will share the same network as the host it is running on. This is followed by an environment variable and the command.

The command will run the agent in server mode and, initially, it expects to be the only one in the cluster -bootstrap-expect=1.

You'll notice the usage of the DOCKER_IP environment variable. Consul expects the information about the binding and the client address. Since we don't know the IP of the servers in advance, it had to be a variable.
At this moment you might be wondering why are we talking about Docker Compose services inside a Swarm cluster. Shouldn't we run docker service create command? The truth is, at the time of this writing, the official consul image is still not adapted to the "Swarm way" of running things. Most images do not require any changes before launching them inside a Swarm cluster. Consul is one of the very few exceptions. I will do my best to update the instructions as soon as the situation changes. Until then, the good old Compose should do:

export DOCKER_IP=$(docker-machine ip swarm-1)

docker-compose -f docker-compose-proxy.yml \
up -d consul-server

You'll notice WARNING: The Docker Engine you're using is running in swarm mode message in the output. It is only a friendly reminder that we are not running this as Docker service. Feel free to ignore it.

Now that we have a Consul instance running, we can go through the basic operations.

We can, for example, put some information into the key-value store:

curl -X PUT -d 'this is a test' \
"http://$(docker-machine ip swarm-1):8500/v1/kv/msg1"

The curl command put this is a test value as the msg1 key inside Consul.

We can confirm that the key-value combination is indeed stored by sending a GET request:

curl "http://$(docker-machine ip swarm-1):8500/v1/kv/msg1"

The output is as follows (formatted for readability):

[
{
"LockIndex": 0,
"Key": "msg1",
"Flags": 0,
"Value": "dGhpcyBpcyBhIHRlc3Q=",
"CreateIndex": 17,
"ModifyIndex": 17
}
]

You'll notice that the value is encoded. If we add the raw parameter to the request, Consul will return only the value in its raw format:

curl "http://$(docker-machine ip swarm-1):8500/v1/kv/msg1?raw"

The output is as follows:

this is a test

Right now, we have only one Consul instance. If the node it is running in fails swarm-1, all the data will be lost and service registry will be unavailable. That's not a good situation to be in.

We can create fault tolerance by running a few more Consul instances. This time, we'll run agents.

Just as the Consul server instance, the agent is also defined in the docker-compose.yml (https://github.com/vfarcic/docker-flow-proxy/blob/master/docker-compose.yml) file in the Docker Flow Proxy (https://github.com/vfarcic/docker-flow-proxy) project. Remember, we downloaded it with the name docker-compose-proxy.yml. Let's take a look at the service definition:

cat docker-compose-proxy.yml

The part of the output that defines the Consul-agent service is as follows:

consul-agent:
container_name: consul
image: consul
network_mode: host
environment:
- 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}'
command: agent -bind=$DOCKER_IP -retry-join=$CONSUL_SERVER_IP \
-client=$DOCKER_IP

It is almost the same as the definition we used to run the Consul server instance. The only important difference is that the -server is missing and that we have the -retry-join argument. We're using the latter to specify the address of another instance. Consul uses the gossip protocol. As long as every instance is aware of at least one other instance, the protocol will propagate the information across all of them.
Let's run agents on the other two nodes swarm-2 and swarm-3:

export CONSUL_SERVER_IP=$(docker-machine ip swarm-1)

for i in 2 3; do
eval $(docker-machine env swarm-$i)

export DOCKER_IP=$(docker-machine ip swarm-$i)

docker-compose -f docker-compose-proxy.yml \
up -d consul-agent
done

Now that we have three Consul instances running inside the cluster (one on each node), we can confirm that gossip indeed works.

Let's request the value of the msg1 key. This time, we'll request it from the Consul instance running on swarm-2:

curl "http://$(docker-machine ip swarm-2):8500/v1/kv/msg1"

As you can see from the output, even though we put the information to the instance running on swarm-1, it is available from the instance in swarm-2. The information is propagated through all the instances.

We can give the gossip protocol one more round of testing:

curl -X PUT -d 'this is another test' \
"http://$(docker-machine ip swarm-2):8500/v1/kv/messages/msg2"

curl -X PUT -d 'this is a test with flags' \
"http://$(docker-machine ip swarm-3):8500/v1/kv/messages/msg3?\ flags=1234"

curl "http://$(docker-machine ip swarm-1):8500/v1/kv/?recurse"

We sent one PUT request to the instance running in swarm-2 and another to the instance in swarm-3. When we requested all the keys from the instance running in swarm-1, all three were returned. In other words, no matter what we do with data, it is always in sync in all of the instances.

Similarly, we can delete information:

curl -X DELETE "http://$(docker-machine ip swarm-2):\
8500/v1/kv/?recurse"


curl "http://$(docker-machine ip swarm-3):8500/v1/kv/?recurse"

We sent the request to the swarm-2 to delete all keys. When we queried the instance running in swarm-3, we got an empty response meaning that everything is, indeed, gone.

With a setup similar to the one we explored, we can have a reliable, distributed, and fault-tolerant way for storing and retrieving any information our services might need.

We'll use this knowledge to explore a possible solution for some of the problems that might arise when running stateful services inside a Swarm cluster. But before we start discussing the solution, let's see what the problem is with stateful services.