postgres: uncommitted xmin from before xid cutoff needs to be frozen

坚强是说给别人听的谎言 提交于 2020-12-13 03:44:45

问题


PostgreSQL 9.6. The server have recently recovered from a sudden power off.

When run select command in pgadmin for table Current, it shows invalid page in block 6455316 of relation base/16384/31656. Then we tried to solve the problem using vacum full "Current". But it shows ERROR: uncommitted xmin 491792044 from before xid cutoff 492223244 needs to be frozen.

If reindex the table directly, it shows:

WARNING:  invalid page in block 6455316 of relation base/16384/31656; zeroing out page
WARNING:  invalid page in block 6455317 of relation base/16384/31656; zeroing out page
WARNING:  invalid page in block 6455318 of relation base/16384/31656; zeroing out page
WARNING:  invalid page in block 6455319 of relation base/16384/31656; zeroing out page
WARNING:  invalid page in block 6455320 of relation base/16384/31656; zeroing out page
WARNING:  invalid page in block 6455321 of relation base/16384/31656; zeroing out page

I searched around for days, but still no luck. So how can I solve this problem?


回答1:


The “invalid page” error shows that you have data corruption, and the other message is probably also a symptom of corruption that did not make the block invalid, but still corrupted the contents.

You should restore a backup and investigate if you have hardware problems with your memory or secondary storage. Also, make sure to update PostgreSQL to the latest bugfix release. Creating a database cluster with --data-checksums makes PostgreSQL detect such problems earlier.




回答2:


Well, the problem fixed even through we don't know how it actually happened. Just SET zero_damaged_pages = on then recreate the index on the broken table.




回答3:


--> Restoring from backup should be the last option.

--> Setting zero_damaged_pages to 'on' will lead to data loss

Instead, you can validate if your standby/slave DB is fine and promote it or take a table dump from the standby and restore it on the Primary.




回答4:


After long research we could conclude several things for our situation:

  • There are different types of backing up:
    • exporting as plaintext SQL queries
    • dumping binary using pg_dump
    • making a copy of the pg files on disk
    • taking a snapshot of the whole virtual machine

Whatever your strategy is, if you are dumping while your environment is being used the chances are likely that your data will get corrupted (inside your backup).

After spending too many hours debugging our backups we found out that restoring the files on disk for a specific moment back in time did involve table files being set back with corrupted data inside.

In our database most things run within Transactions. So that should prevent us from having corrupt or broken data inside our tables. But when making a copy of tables on the hard disk you will always have "uncommitted data" inside your tables. When you restore these tables these data are still uncommitted but the transaction is not on your restored system so its in limbo.

Traversing through the table gave problems. REINDEXing the table fixed the issue of looking through the data, but for some reason most of the data was not in the index anymore (so our table shrunk in size, having lost a lot of data).

For our case the VACUUM (FULL) did not result into anything useful.

When we used another backup to restore from (backup type 3, see above) we ran into this error:

LOG:  redo starts at 160/1D7E62C8
LOG:  invalid record length at 160/1EBFD408: wanted 24, got 0
LOG:  redo done at 160/1EBFD398

And the result was that postgres removed our whole database. The problem was that the base folder inside the postgres folder on our harddisk contained the database, but the pg_wal folder did not contain the right references to it. So it was deleted as a whole.

So summarizing all of this: uncommitted data is data which is written because you were in the middle of a transaction at the moment that the data was being backupped OR the server suddenly was shut off. REINDEXing your tables (or the whole table) is your best shot, but only do such things after making a restore point first.

So first make a snapshot or a tar.gz or a zip file or so of the current postgresql folder.

service postgresql stop
cd /home
tar zcfv pg_backup.tar.gz /var/lib/postgresql/11/
service postgresql start

And then start performing maintenance:

REINDEX DATABASE dbname;

And if you want to free up the dead tuples after reindexing:

VACUUM FULL;

If things were fixed, then you're OK. If not you can try to drop the table and re-import from another earlier backup. Possibly a plaintext backup. And if you dont have such a backup try if you can export the table data with an IDE or with a script that you create yourself to pull as much data as possible from the database. Then create a structure export of the table, drop the table (with all uncommitted binary (corrupted) data in it), re-create the table and run the exported SQL queries on it to re-fill it.



来源:https://stackoverflow.com/questions/57950050/postgres-uncommitted-xmin-from-before-xid-cutoff-needs-to-be-frozen

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!