What is the best way to load a massive amount of data into PostgreSQL?

纵然是瞬间 提交于 2019-12-04 19:17:35

Do NOT use indexes except for unique single numeric key.

That doesn't fit with all DB theory we received but testing with heavy loads of data demonstrate it. Here is a result of 100M loads at a time to reach 2 Billions rows in a table, and each time a bunch of various queries on the resulting table. First graphic with 10 gigabit NAS (150MB/s), second with 4 SSD in RAID 0 (R/W @ 2GB/s).

If you have more than 200 millions row in a table on regular disks, it's faster if you forget indexes. On SSD's, the limit is at 1 billion.

I've done it also with partitions for better results but with PG9.2 it's difficult to benefit from them if you use stored procedures. You also have to take care of writing/reading to only 1 partition at a time. However partitions are the way to go to keep your tables below the 1 Billion row wall. It also helps a lot to multiprocess your loads. With SSD, single process let me insert (copy) 18,000 rows/s (with some processing work included). With multiprocessing on 6 CPU, it grows to 80,000 rows/s.

Watch your CPU & IO usage while testing to optimize both.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!