nginx - Load Balancer - Considerable lag when upstream node is offline/down

Running nginx 1.0.15 on CentOS 6.5. I have three upstream servers and everything works fine, however when I simulate an outage, and take one of the upstream servers down, I notice considerable lag in response times (additional 5-7 seconds). The second I bring the downed server back online, the lag disappears. Also, another weird thing I noticed, if I simply stop the httpd service on the simulated outage server, the response times are normal, the lag only occurs if the server is completely down.

Here is my conf:

upstream prod_example_com {    server app-a-1:51000;    server app-a-2:51000;    server app-a-3:51000;}server {    # link:  http://wiki.nginx.org/MailCoreModule#server_name    server_name example.com www.example.com *.example.com;    #-----    # Upstream logic    #-----    set $upstream_type prod_example_com;    #-----    include include.d/common.conf;    # Configure logging    access_log  /var/log/nginx/example/access/access.log access;    error_log   /var/log/nginx/example/error.log error;    location / {        # link: http://wiki.nginx.org/HttpProxyModule#proxy_pass        proxy_pass  http://$upstream_type$request_uri;        # link: http://wiki.nginx.org/HttpProxyModule#proxy_set_header        proxy_set_header    Host    $host;        proxy_set_header    X-Real-IP   $remote_addr;        proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;    }    location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ {        # link: http://wiki.nginx.org/HttpProxyModule#proxy_pass        proxy_pass  http://$upstream_type$request_uri;        # link: http://wiki.nginx.org/HttpProxyModule#proxy_set_header        proxy_set_header    Host    $host;        proxy_set_header    X-Real-IP   $remote_addr;        proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;        proxy_hide_header expires;        proxy_hide_header Cache-Control         # Even tho this reads like the older syntax, it is handled internally by nginx to set max age to now + 1 year         expires max;        # Allow intermediary caches the ability to cache the asset        add_header Cache-Control "public";    }}

I have tried the suggestions on similar posts like this. And apparently my version of nginx is too old to support health_checks as outlined in the nginx docs. I've also tried to explicitly set the max_fails=2 and fail_timeout=120 on the app-a-3 upstream definition, but none of these seem to avoid the additional 5-7 seconds lag for every request if app-a-3 is offline.

-- Update --

Per request, here is the output for a single request when app-a-3 is completely down. The only thing I could see out of the ordinary is the 3 second lag between initial event and subsequent event.

-- Update #2 --

Looks like a few years ago Nginx decided to create Nginx Plus, which adds active health checks, but for a yearly support contract. Based on some articles I've read, Nginx got sick of making companies millions, and getting nothing in return.

As mentioned in the comments we are bootstrapping and don't have the $$ to throw at a $1,350 contract. I did find this repo which provides the functionality. Wondering if anyone has any experience with it? Stable? Performant?

Worst case scenario I will just have to bit the bullet and pay the extra $20 / month for a Linode "Node Balancer" which I am pretty sure is based off of Nginx Plus. The only problem is there is no control over the config other than a few generic options, so no way to support multiple vhost files via one balancer, and all the nodes have to be in the same datacenter.

-- Update #3 --

Here are some siege results. It seems the second node is misconfigured, as it is only able to handle about 75% of the requests the first and third nodes are handling. Also I thought it odd, that when I took the second node offline, the performance was as bad as if I took the third (better performing) node offline. Logic would dictate that if I removed the weak link (second node), that I would get better performance because the remaining two nodes perform better than the weak link, individually.

In short:

node 1, 2, 3 + my nginx = 2037 requestsnode 1, 2 + my nginx  = 733 requestsnode 1, 3 + my nginx = 639 requests (huh? these two perform better individually so together should be somewhere around ~1500 requests, based on 2000 requests when all nodes are up)node 1, 3 + Linode Load Balancer = 790 requestsnode 1, 2, 3 + Linode Load Balancer = 1,988 requests

Latest Images

Trending Articles

Latest Images