Bad gateway errors at load on nginx + Unicorn (Rails 3 app) -
i have rails (3.2) app runs on nginx , unicorn on cloud platform. "box" running on ubuntu 12.04.
when system load @ 70% or above, nginx abruptly (and seemingly randomly) starts throwing 502 bad gateway errors; when load less there's nothing it. have experimented various number of cores (4, 6, 10 - can "change hardware" it's on cloud platform), , situation same. (cpu load similar system load, userland 55%, rest being system , stolen, plenty of free memory, no swapping.)
502's come in batches not always.
(i run 1 unicorn worker per core, , 1 or 2 nginx workers. see relevant parts of configs below when running on 10 cores.)
i don't know how track cause of these errors. suspect may have unicorn workers not being able serve (in time?) looks odd because not seem saturate cpu , see no reason why wait io (but don't know how make sure of either).
can you, please, me how go finding cause?
unicorn config (unicorn.rb
):
worker_processes 10 working_directory "/var/www/app/current" listen "/var/www/app/current/tmp/sockets/unicorn.sock", :backlog => 64 listen 2007, :tcp_nopush => true timeout 90 pid "/var/www/app/current/tmp/pids/unicorn.pid" stderr_path "/var/www/app/shared/log/unicorn.stderr.log" stdout_path "/var/www/app/shared/log/unicorn.stdout.log" preload_app true gc.respond_to?(:copy_on_write_friendly=) , gc.copy_on_write_friendly = true check_client_connection false before_fork |server, worker| ... believe stuff here irrelevant ... end after_fork |server, worker| ... believe stuff here irrelevant ... end
and ngnix config:
/etc/nginx/nginx.conf
:
worker_processes 2; worker_rlimit_nofile 2048; user www-data www-admin; pid /var/run/nginx.pid; error_log /var/log/nginx/nginx.error.log info; events { worker_connections 2048; accept_mutex on; # "on" if nginx worker_processes > 1 use epoll; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; # optimialization efforts client_max_body_size 2m; client_body_buffer_size 128k; client_header_buffer_size 4k; large_client_header_buffers 10 4k; # 1 each core or 1 each unicorn worker? client_body_temp_path /tmp/nginx/client_body_temp; include /etc/nginx/conf.d/*.conf; }
/etc/nginx/conf.d/app.conf
:
sendfile on; tcp_nopush on; tcp_nodelay off; gzip on; gzip_http_version 1.0; gzip_proxied any; gzip_min_length 500; gzip_disable "msie [1-6]\."; gzip_types text/plain text/css text/javascript application/x-javascript; upstream app_server { # fail_timeout=0 means retry upstream if failed # return http response (in case unicorn master nukes # single worker timing out). server unix:/var/www/app/current/tmp/sockets/unicorn.sock fail_timeout=0; } server { listen 80 default deferred; server_name _; client_max_body_size 1g; keepalive_timeout 5; root /var/www/app/current/public; location ~ "^/assets/.*" { ... } # prefer serve static files directly nginx avoid unnecessary # data copies application server. try_files $uri/index.html $uri.html $uri @app; location @app { proxy_set_header x-forwarded-for $proxy_add_x_forwarded_for; proxy_set_header host $http_host; proxy_redirect off; proxy_pass http://app_server; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 90; proxy_buffer_size 128k; proxy_buffers 10 256k; # 1 per core or 1 per unicorn worker? proxy_busy_buffers_size 256k; proxy_temp_file_write_size 256k; proxy_max_temp_file_size 512k; proxy_temp_path /mnt/data/tmp/nginx/proxy_temp; open_file_cache max=1000 inactive=20s; open_file_cache_valid 30s; open_file_cache_min_uses 2; open_file_cache_errors on; } }
after googling expressions found in nginx error log turned out known issue has nothing nginx, little unicorn , rooted in os (linux) settings.
the core of problem socket backlog short. there various considerations how should (whether want detect cluster member failure asap or keep application push load limits). in case listen
:backlog
has needs tweaking.
i found in case listen ... :backlog => 2048
sufficient. (i did not experiment much, though there's hack if like, having 2 sockets communicate between nginx , unicorn different backlogs , longer being backup; see in nginx log how shorter queue fails.) please note it's not result of scientific calculation , ymmv.
note, however, many os-es (most linux distros, ubuntu 12.04 included) have lower os level default limits on socket backlog sizes (as low 128).
you can change os limits follows (being root):
sysctl -w net.core.somaxconn=2048 sysctl -w net.core.netdev_max_backlog=2048
add these /etc/sysctl.conf
make changes permanent. (/etc/sysctl.conf
can reloaded without rebooting sysctl -p
.)
there mentions may have increase maximum number of files can opened process (use ulimit -n
, /etc/security/limits.conf
permanency). had done other reasons cannot tell if makes difference or not.
Comments
Post a Comment