In a multi-server environment, if a site has inactivity for more than 15 mn, the server loses connection to PostgreSQL database

限于喜欢 提交于 2019-12-10 11:42:26

问题


I get the following errors in airbrake if my staging (2 servers) or production (4 servers) servers have no activity for about 15 minutes. Here are the error messages:

ActiveRecord::StatementInvalid: PG::Error: could not receive data from server: Connection timed out

OR

PG::Error: could not connect to server: Connection timed out Is the server running on host "tci-db4.dev.prod" and accepting TCP/IP connections on port 5432?

I'm using PostgreSQL as my database. One of the servers also acts as the db server.

Environment:

Ruby 1.9.3 (This also happened under Ruby 1.8.7, but it is worse since upgrading since the ruby process on the server will go to 100% and stay at 100% until is killed when the server loses the db connection.

Rails 3.1.6

PG GEM 0.13.2

Postgres 9.1

Phusion Passenger

This problem has been happening for over a year, so I'm hoping someone has some insight on how to fix it. Thanks.


回答1:


Check your TCP/IP socket timeout settings on all routers/switches between the application servers and the database servers. Also turn on logging on the database side and watch the full life cycle of the connection and compare the timing to the errors in your application. I suggest turning on the following settings in postgresql.conf until you get an idea of what to look for:

log_connections = on
log_disconnections = on
log_statement = all

These can be activated with a SIGHUP of the postgres process (or run "SELECT pg_reload_conf();" as a database superuser.

I'll be that you have a "connection closed by remote host" or something similar as the last message before the actual disconnect is logged.

I've seen this before and it was the timeout settings on an intermediate switch causing it.




回答2:


You probably have a NAT router, connection tracking firewall, or an uppity "layer 3 switch" between the client and the server. These devices flush remembered connections from their tables after a timeout. You will need to enable keepalives.




回答3:


Maintaining a lot of keepalived connections from 4 application servers may be quite hard to do (it may represent a very high number of connections. You may check PgPool-II to maitain a reasonnable number of keepalived connections between pgpool and your postgres server. pgPool will also queue connection when too much process ask for a connection. After that check how the connections are managed in your application. Is there a pool of connections managed in the app server? Do you still need it? Do you have a need for long-standing connections or can you simply use short sessions connections?

If you still have disconnected sessions between PgPool and your postgreSQl server you will have to check for TCP/IP problems. Such problems can come from the OS TCP/IP settings, but can also be tweaked in postgreSQl configuration. Check for tcp_keepalive settings on that runtime configuration manual page. if you use pgpool, check for health_check settings.



来源:https://stackoverflow.com/questions/11417504/in-a-multi-server-environment-if-a-site-has-inactivity-for-more-than-15-mn-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!