Bad gateway errors at load on nginx + Unicorn (Rails 3 app) -


i have rails (3.2) app runs on nginx , unicorn on cloud platform. "box" running on ubuntu 12.04.

when system load @ 70% or above, nginx abruptly (and seemingly randomly) starts throwing 502 bad gateway errors; when load less there's nothing it. have experimented various number of cores (4, 6, 10 - can "change hardware" it's on cloud platform), , situation same. (cpu load similar system load, userland 55%, rest being system , stolen, plenty of free memory, no swapping.)

502's come in batches not always.

(i run 1 unicorn worker per core, , 1 or 2 nginx workers. see relevant parts of configs below when running on 10 cores.)

i don't know how track cause of these errors. suspect may have unicorn workers not being able serve (in time?) looks odd because not seem saturate cpu , see no reason why wait io (but don't know how make sure of either).

can you, please, me how go finding cause?


unicorn config (unicorn.rb):

worker_processes 10 working_directory "/var/www/app/current" listen "/var/www/app/current/tmp/sockets/unicorn.sock", :backlog => 64 listen 2007, :tcp_nopush => true timeout 90 pid "/var/www/app/current/tmp/pids/unicorn.pid" stderr_path "/var/www/app/shared/log/unicorn.stderr.log" stdout_path "/var/www/app/shared/log/unicorn.stdout.log" preload_app true gc.respond_to?(:copy_on_write_friendly=) ,   gc.copy_on_write_friendly = true check_client_connection false  before_fork |server, worker|   ... believe stuff here irrelevant ... end after_fork |server, worker|   ... believe stuff here irrelevant ... end 

and ngnix config:

/etc/nginx/nginx.conf:

worker_processes 2; worker_rlimit_nofile 2048; user www-data www-admin; pid /var/run/nginx.pid; error_log /var/log/nginx/nginx.error.log info;  events {   worker_connections 2048;   accept_mutex on; # "on" if nginx worker_processes > 1   use epoll; }  http {     include       /etc/nginx/mime.types;     default_type  application/octet-stream;     log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '                       '$status $body_bytes_sent "$http_referer" '                       '"$http_user_agent" "$http_x_forwarded_for"';     access_log  /var/log/nginx/access.log  main;     # optimialization efforts     client_max_body_size        2m;     client_body_buffer_size     128k;     client_header_buffer_size   4k;     large_client_header_buffers 10 4k;  # 1 each core or 1 each unicorn worker?     client_body_temp_path       /tmp/nginx/client_body_temp;      include /etc/nginx/conf.d/*.conf; } 

/etc/nginx/conf.d/app.conf:

sendfile on; tcp_nopush on; tcp_nodelay off; gzip on; gzip_http_version 1.0; gzip_proxied any; gzip_min_length 500; gzip_disable "msie [1-6]\."; gzip_types text/plain text/css text/javascript application/x-javascript;  upstream app_server {   # fail_timeout=0 means retry upstream if failed   # return http response (in case unicorn master nukes   # single worker timing out).   server unix:/var/www/app/current/tmp/sockets/unicorn.sock fail_timeout=0; }  server {   listen 80 default deferred;   server_name _;   client_max_body_size 1g;   keepalive_timeout 5;   root /var/www/app/current/public;    location ~ "^/assets/.*" {       ...   }    # prefer serve static files directly nginx avoid unnecessary   # data copies application server.   try_files $uri/index.html $uri.html $uri @app;    location @app {     proxy_set_header x-forwarded-for $proxy_add_x_forwarded_for;     proxy_set_header host $http_host;     proxy_redirect off;      proxy_pass http://app_server;      proxy_connect_timeout      90;     proxy_send_timeout         90;     proxy_read_timeout         90;      proxy_buffer_size          128k;     proxy_buffers              10 256k;  # 1 per core or 1 per unicorn worker?     proxy_busy_buffers_size    256k;     proxy_temp_file_write_size 256k;     proxy_max_temp_file_size   512k;     proxy_temp_path            /mnt/data/tmp/nginx/proxy_temp;      open_file_cache max=1000 inactive=20s;      open_file_cache_valid    30s;      open_file_cache_min_uses 2;     open_file_cache_errors   on;   } } 

after googling expressions found in nginx error log turned out known issue has nothing nginx, little unicorn , rooted in os (linux) settings.

the core of problem socket backlog short. there various considerations how should (whether want detect cluster member failure asap or keep application push load limits). in case listen :backlog has needs tweaking.

i found in case listen ... :backlog => 2048 sufficient. (i did not experiment much, though there's hack if like, having 2 sockets communicate between nginx , unicorn different backlogs , longer being backup; see in nginx log how shorter queue fails.) please note it's not result of scientific calculation , ymmv.

note, however, many os-es (most linux distros, ubuntu 12.04 included) have lower os level default limits on socket backlog sizes (as low 128).

you can change os limits follows (being root):

sysctl -w net.core.somaxconn=2048 sysctl -w net.core.netdev_max_backlog=2048 

add these /etc/sysctl.conf make changes permanent. (/etc/sysctl.conf can reloaded without rebooting sysctl -p.)

there mentions may have increase maximum number of files can opened process (use ulimit -n , /etc/security/limits.conf permanency). had done other reasons cannot tell if makes difference or not.


Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -