Specific postgresql server configuration for data analysis purposes

ε祈祈猫儿з 提交于 2019-12-23 19:35:09

问题


Is there any tips on tuning server's performance using postgresql.conf file in case you use a postgresql database specifically for data science department and data analysis purposes? Or performance tuning itself is purpose-agnostic and there is no real difference what you will do with it since 'it is all about extracting data'?

It's a rather obscure question i didn't find an answer for (in miriads of articles on data science topic).


回答1:


Though this is a very general question, I'll try my best to give you a hint or two:

You could first asses the outlines of your requirements, such as:

  • are we talking about big-data chunks? (buffer-sizes)
  • from how many clients are queries performed? (allowed connections)
  • are you using postgresql's internal functions?
  • do you need permanent backups or copy tables or db's around?
  • etc.,etc.,..

I would recommend you read the article from the official docs on resource consumption and the docs on query-planning as well as server configuration in general.

In case you can't derive a proper approach after reading the docs I can recommend the pg-forum. The experienced user 'akretschmer' is a postgresql-pro and might be able to help you if you formulate your question in a detailed and meaningful way ;)




回答2:


The same question as OP had occurred to me, and I couldn't find anything about it. Our requirement is simply 2 data scientists accessing the data, slicing it, exploring, etc. Here is our current setup and configuration

  • Data: 5 billion rows (~300GB) of AWS cloudwatch 5-minute data
  • Hardware: AWS EC2 t2.2xlarge (8 cores, 32GB RAM, 500GB gp2 disk)
  • Postgresql version 10
  • Modified sections of /etc/postgresql/10/main/postgresql.conf
work_mem = 25GB
maintenance_work_mem = 25GB

max_worker_processes = 8
max_parallel_workers = 8
max_parallel_workers_per_gather = 4

I'll be more than happy if someone has further suggestions.

Edit: I posted this as a question on DBA stackexchange, for further suggestions.



来源:https://stackoverflow.com/questions/52775971/specific-postgresql-server-configuration-for-data-analysis-purposes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!