Discussing load balancing with HAProxy

When an application becomes popular, it sends an increased number of requests to the application server. A single application server may not be able to handle the entire load alone. We can always scale up the underlying hardware, that is, add more memory and more powerful CUPs to increase the server capacity; but these improvements do not always scale linearly. To solve this problem, multiple replicas of the application server are created and the load is distributed among these replicas. Load balancing can be implemented at OSI Layer 4, that is, at TCP or UDP protocol levels, or at Layer 7, that is, application level with HTTP, SMTP, and DNS protocols.

In this recipe, we will install a popular load balancing or load distributing service, HAProxy. HAProxy receives all the requests from clients and directs them to the actual application server for processing. Application server directly returns the final results to the client. We will be setting HAProxy to load balance TCP connections.

Getting ready

You will need two or more application servers and one server for HAProxy:

  • You will need the root access on the server where you want to install HAProxy
  • It is assumed that your application servers are properly installed and working

How to do it…

Follow these steps to discus load balancing with HAProxy:

  1. Install HAProxy:
    $ sudo apt-get update
    $ sudo apt-get install haproxy
    
  2. Enable the HAProxy init script to automatically start HAProxy on system boot. Open /etc/default/haproxy and set ENABLE to 1:
  3. Now, edit the HAProxy /etc/haproxy/haproxy.cfg configuration file. You may want to create a copy of this file before editing:
    $ cd /etc/haproxy
    $ sudo cp haproxy.cfg haproxy.cfg.copy
    $ sudo nano haproxy.cfg
    
  4. Find the defaults section and change the mode and option parameters to match the following:
    mode tcp
    option tcplog
    
  5. Next, define frontend, which will receive all requests:
    frontend www
     bind 57.105.2.204:80 # haproxy public IP
     default_backend as-backend # backend used
    
  6. Define backend application servers:
    backend as-backend
     balance leastconn
     mode tcp
     
    server as1 10.0.2.71:80 check # application srv 1
     server as2 10.0.2.72:80 check # application srv 2
    
  7. Save and quit the HAProxy configuration file.
  8. We need to set rsyslog to accept HAProxy logs. Open the rsyslog.conf file, /etc/rsyslog.conf, and uncomment following parameters:
    $ModLoad imudp
    $UDPServerRun 514
    
  9. Next, create a new file under /etc/rsyslog.d to specify the HAProxy log location:
    $ sudo nano /etc/rsyslog.d/haproxy.conf
    
  10. Add the following line to the newly created file:
    local2.* /var/log/haproxy.log
    
  11. Save the changes and exit the new file.
  12. Restart the rsyslog service:
    $ sudo service rsyslog restart
    
  13. Restart HAProxy:
    $ sudo service haproxy restart
    
  14. Now, you should be able to access your backend with the HAProxy IP address.

How it works…

Here, we have configured HAProxy as a frontend for a cluster of application servers. Under the frontend section, we have configured HAProxy to listen on the public IP of the HAProxy server. We also specified a backend for this frontend. Under the backend section, we have set a private IP address of the application servers. HAProxy will communicate with the application servers through a private network interface. This will help to keep the internal network latency to a minimum.

HAProxy supports various load balancing algorithms. Some of them are as follows:

  • Round-robin distributes the load in a round robin fashion. This is the default algorithm used.
  • leastconn selects the backend server with fewest connections.
  • source uses the hash of the client's IP address and maps it to the backend. This ensures that requests from a single user are served by the same backend server.

We have selected the leastconn algorithm, which is mentioned under the backend section with the balance leastconn line. The selection of a load balancing algorithm will depend on the type of application and length of connections.

Lastly, we configured rsyslog to accept logs over UDP. HAProxy does not provide separate logging system and passes logs to the system log daemon, rsyslog, over the UDP stream.

There's more …

Depending on your Ubuntu version, you may not get the latest version of HAProxy from the default apt repository. Use the following repository to install the latest release:

$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:vbernat/haproxy-1.6 # replace 1.6 with required version
$ sudo apt-get update && apt-get install haproxy

See also