Postgresql fails specific query ONE time after Windows reboot

那年仲夏 提交于 2020-01-23 10:46:48

问题


I'm using Postgresql on Windows in a C# application. The problem I'm having is really weird and can be described as follows:

  • I restart my Windows
  • I run the program
  • One specific query fails: SELECT COUNT(*) AS c FROM files WHERE total_bytes IS NOT NULL
  • I run the program again and everything works normally

Weird notes:

  • I tried making another query before that one (even using the same table) and it worked: SELECT COUNT(*) AS c FROM files
  • I wasn't able to reproduce the error restarting Postgresql. It only happens on a Windows reboot. And it happens only once.

The exception traceback:

Npgsql.NpgsqlException: Exception while reading from stream

   at Npgsql.ReadBuffer.Ensure(Int32 count, Boolean dontBreakOnTimeouts)
   at Npgsql.NpgsqlConnector.DoReadMessage(DataRowLoadingMode dataRowLoadingMode, Boolean isPrependedMessage)
   at Npgsql.NpgsqlConnector.ReadMessageWithPrepended(DataRowLoadingMode dataRowLoadingMode)
   at Npgsql.NpgsqlConnector.ReadMessage(DataRowLoadingMode dataRowLoadingMode)
   at Npgsql.NpgsqlConnector.ReadExpecting[T]()
   at Npgsql.NpgsqlDataReader.NextResultInternal()
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.Execute(CommandBehavior behavior)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReaderInternal(CommandBehavior behavior)
   at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at System.Data.Common.DbCommand.ExecuteReader()
   at Npgsql.NpgsqlCommand.ExecuteReader()
   at DriveShare.Database.Postgresql.ExecuteQuery(NpgsqlCommand command) in c:\projetos\driveshareclient\DriveShare\DriveShare\Database\Postgresql.cs:line 216
   at DriveShare.Database.Postgresql.Query(String sql, Object[] args) in c:\projetos\driveshareclient\DriveShare\DriveShare\Database\Postgresql.cs:line 72
   at DriveShare.Database.Postgresql.QueryOne(String sql, Object[] args) in c:\projetos\driveshareclient\DriveShare\DriveShare\Database\Postgresql.cs:line 83
   at DriveShare.Database.Postgresql.QueryValue(String key, String sql, Object[] args) in c:\projetos\driveshareclient\DriveShare\DriveShare\Database\Postgresql.cs:line 97
   at DriveShare.Database.Postgresql.QueryValue(String key, String sql) in c:\projetos\driveshareclient\DriveShare\DriveShare\Database\Postgresql.cs:line 92
   at DriveShare.Database.FileIndexDataSet.CountIndexedFiles() in c:\projetos\driveshareclient\DriveShare\DriveShare\Database\FileIndexDataSet.cs:line 89
   at DriveShare.Engine.DriveShareEngine.Start() in c:\projetos\driveshareclient\DriveShare\DriveShare\Engine\DriveShareEngine.cs:line 156
   at DriveShareWebService.Program.Main(String[] args) in c:\projetos\driveshareclient\DriveShare\DriveShareWebService\Program.cs:line 19

Since I have to keep the program working, I wrote a workaround to make the app retry that query before proceeding. I'm not proud of that:

public void WaitForConnection()
{
    int limitSeconds = 3 * 60;
    var start = DateTime.Now;
    while (true)
    {
        try
        {
            Log.WaitingForDatabaseConnection();
            Query("SELECT COUNT(*) AS c FROM files WHERE total_bytes IS NOT NULL");
            Log.DatabaseConnectionAquired();
            break;
        }
        catch (Exception e)
        {
            var wastedTime = DateTime.Now - start;
            if (wastedTime.TotalSeconds > limitSeconds)
                throw;
            else
                Log.Exception(e);
        }
        Thread.Sleep(1000);
    }
}

I'm using Npgsql (in a thin abstraction class) to connect to Postgresql. Postgresql logs show some entries about winsock errors that I do not yet understand:

2016-08-16 10:14:34 BRT LOG:  database system was shut down at 2016-08-16 10:12:07 BRT
2016-08-16 10:14:34 BRT FATAL:  the database system is starting up
2016-08-16 10:14:34 BRT LOG:  MultiXact member wraparound protections are now enabled
2016-08-16 10:14:34 BRT LOG:  sistema de banco de dados está pronto para aceitar conexões
2016-08-16 10:14:34 BRT LOG:  autovacuum launcher started
2016-08-16 10:17:16 BRT LOG:  could not receive data from client: unrecognized winsock error 10053
2016-08-16 10:17:27 BRT LOG:  could not send data to client: unrecognized winsock error 10054
2016-08-16 10:17:27 BRT STATEMENT:  SELECT path FROM files
2016-08-16 10:17:27 BRT FATAL:  connection to client lost
2016-08-16 10:17:27 BRT STATEMENT:  SELECT path FROM files
2016-08-16 10:17:27 BRT LOG:  could not receive data from client: unrecognized winsock error 10053
2016-08-16 10:17:27 BRT LOG:  unexpected EOF on client connection with an open transaction
2016-08-16 10:17:33 BRT LOG:  unexpected EOF on client connection with an open transaction
2016-08-16 10:25:14 BRT LOG:  could not receive data from client: unrecognized winsock error 10053
2016-08-16 10:25:15 BRT LOG:  could not receive data from client: unrecognized winsock error 10053
2016-08-16 10:25:15 BRT LOG:  unexpected EOF on client connection with an open transaction
2016-08-16 10:26:30 BRT LOG:  could not send data to client: unrecognized winsock error 10054
2016-08-16 10:26:30 BRT FATAL:  connection to client lost
2016-08-16 10:26:50 BRT LOG:  could not send data to client: unrecognized winsock error 10054
2016-08-16 10:26:50 BRT FATAL:  connection to client lost
2016-08-16 10:26:50 BRT LOG:  could not receive data from client: unrecognized winsock error 10053
2016-08-16 10:26:50 BRT LOG:  unexpected EOF on client connection with an open transaction
2016-08-16 10:27:06 BRT LOG:  could not send data to client: unrecognized winsock error 10054
2016-08-16 10:27:06 BRT FATAL:  connection to client lost
2016-08-16 10:27:06 BRT LOG:  could not send data to client: unrecognized winsock error 10054
2016-08-16 10:27:06 BRT FATAL:  connection to client lost
2016-08-16 10:27:30 BRT LOG:  pedido de desligamento rápido foi recebido
2016-08-16 10:27:30 BRT LOG:  interrompendo quaisquer transações ativas
2016-08-16 10:27:30 BRT LOG:  autovacuum launcher shutting down
2016-08-16 10:27:30 BRT ERROR:  canceling statement due to user request
2016-08-16 10:27:30 BRT LOG:  autovacuum launcher shutting down
2016-08-16 10:27:30 BRT LOG:  shutting down
2016-08-16 10:27:30 BRT LOG:  database system is shut down

I don't expect someone to know what exactly what my problem is. I was just hoping someone might have had some similar issues that could shed some light in it.


回答1:


With some help I found the solution in the Npgsql docs, here.

Npgsql by default comes with some timeout parameters for connection and commands. After a Windows reboot, first access to the table was very slow, triggering the command timeout.

With additional parameters on the connection string I was able to change those settings higher and solve my problem:

connectionString += ";Timeout=180;Command Timeout=180";

Bonus tip: a Postgresql function pg_sleep(seconds) helped me reproduce the problem without actual reboots. Very helpful:

SELECT pg_sleep(60);


来源:https://stackoverflow.com/questions/38977119/postgresql-fails-specific-query-one-time-after-windows-reboot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!