书名：The DevOps 2.1 Toolkit：Docker Swarm
作者名：Viktor Farcic
本章字数：1071字
更新时间：2025-02-22 02:51:54

What would Docker Swarm look like without?

Let's say we have a cluster with three nodes. Two of them run Swarm managers, and one is a worker. Managers accept our requests, decide what should be done, and send tasks to Swarm workers. In turn, workers translate those tasks into commands that are sent to the local Docker Engine. Managers act as workers as well.

If we describe the flow we did earlier with the go-demo service, and imagine there is no service discovery associated with Swarm, it would be as follows.
A user sends a request to one of the managers. The request is not a declarative instruction but an expression of the desired state. For example, I want to have two instances of the go-demo service and one instance of the DB running inside the cluster:

Figure 4-1: User sends a request to one of the managers

Once Swarm manager receives our request for the desired state, it compares it with the current state of the cluster, generates tasks, and sends them to Swarm workers. The tasks might be to run an instance of the go-demo service on node-1 and node-2, and an instance of the go-demo-db service on node-3:

Figure 4-2: Swarm manager compares the current state of the cluster with the desired state, generates tasks, and sends them to Swarm workers.

Swarm workers receive tasks from the managers, translate them into Docker Engine commands, and send them to their local Docker Engine instances:

Figure 4-3: Swarm nodes translate received tasks to Docker Engine commands

Docker Engine receives a command from the Swarm worker and executes it:

Figure 4-4: Docker Engine manages local containers.

Next, let's say that we send a new desired state to the manager. For example, we might want to scale the number of the go-demo instances to node-3. We would send a request to the Swarm manager on node-1, it would consult the cluster state it stored internally and make a decision to, for example, run a new instance on node-2. Once the decision is made, the manager would create a new task and send it to the Swarm worker on node-2. In turn, the worker would translate the task into a Docker command, and send it to the local engine. Once the command is executed, we would have the third instance of the go-demo service running on node-2:

Figure 4-5: A scale request is sent to the Swarm manager

If the flow were as described, we would have quite a lot of problems that would make such a solution almost useless.

Let's try to list some of the issues we would face.

A Docker manager uses the information we sent to it. That would work as long as we always use the same manager and the state of the cluster does not change due to factors outside the control of the manager. The important thing to understand is that the information about the cluster is not stored in one place, nor it is complete. Each manager knows only about the things it did. Why is that such a problem?

Let's explore a few alternative (but not uncommon) paths.

What would happen if we sent the request to scale to three instances to the manager on node-2? That manager would be oblivious of the tasks created by the manager in node-1. As a result, it would try to run three new instances of the go-demo service resulting in five instances in total. We’d have two instances created by the manager in node-1 and three by the manager in node-2.
It would be tempting always to use the same manager, but, in that case, we would have a single point of failure. What would happen if the whole node-1 fails? We would have no managers available or would be forced to use the manager on node-2.

Many other factors might produce such discrepancies. Maybe one of the containers stopped unexpectedly. In such a case, when we decide to scale to three instances, the manager on node-1 would think that two instances are running and would create a task to run one more. However, that would not result in three but two instances running inside the cluster.

The list of things that might go wrong is infinite, and we won't go into more examples.

The important thing to note is that it is unacceptable for any single manager to be stateful in isolation. Every manager needs to have the same information as any other. On the other hand, every node needs to monitor events generated by Docker Engine and make sure that any change to its server is propagated to all managers. Finally, we need to oversee the state of each server in case one of them fails. In other words, each manager needs to have an up-to-date picture of the entire cluster. Only then it can translate our requests for the desired state into tasks that will be dispatched to the Swarm nodes.

How can all the managers have a complete view of the whole cluster no matter who made a change to it?

The answer to that question depends on the requirements we set. We need a place where all the information is stored. Such a place need to be distributed so that the failure of one server does not affect the correct functioning of the tool. Being distributed provides fault tolerance, but that, by itself, does not mean data is synchronized across the cluster. The tool needs to maintain data replicated across all the instances. Replication is not anything new except that, in this case, it needs to be very fast so that the services that would consult it can receive data in (near) real-time. Moreover, we need a system that will monitor each server inside the cluster and update the data if anything changes.

To summarize, we need a distributed service registry and a monitoring system in place. The first requirement is best accomplished with one of the service registries or key-value stores. The old Swarm (standalone version before Docker 1.12) supports Consul (https://www.consul.io/), etcd (https://github.com/coreos/etcd), and Zookeeper (https://zookeeper.apache.org/). My preference is towards Consul, but any of the three should do.

For a more detailed discussion about service discovery and the comparison of the major service registries, please consult the service discovery: The Key to Distributed services chapter of The DevOps 2.0 Toolkit.