Redundancy
A mission-critical system has to be available all the time, around the clock, 365 days a year. Downtime is not acceptable, since it might result in a huge loss of opportunities or reputation for the company. In a highly distributed application, the likelihood of a failure of at least one of the many involved components is non-neglectable. One can say that the question is not whether a component will fail, but rather when a failure will occur.
To avoid downtime when one of the many components in the system fails, each individual part of the system needs to be redundant. This includes the application components as well as all infrastructure parts. What that means is that if we, say, have a payment service as part of our application, then we need to run this service redundantly. The easiest way to do that is to run multiple instances of this very service on different nodes of our cluster. The same applies, say, for an edge router or a load balancer. We cannot afford for this to ever go down. Thus the router or load balancer must be redundant.