问题
Is there any tips on tuning server's performance using postgresql.conf file in case you use a postgresql database specifically for data science department and data analysis purposes? Or performance tuning itself is purpose-agnostic and there is no real difference what you will do with it since 'it is all about extracting data'?
It's a rather obscure question i didn't find an answer for (in miriads of articles on data science topic).
回答1:
Though this is a very general question, I'll try my best to give you a hint or two:
You could first asses the outlines of your requirements, such as:
- are we talking about big-data chunks? (buffer-sizes)
- from how many clients are queries performed? (allowed connections)
- are you using postgresql's internal functions?
- do you need permanent backups or copy tables or db's around?
- etc.,etc.,..
I would recommend you read the article from the official docs on resource consumption and the docs on query-planning as well as server configuration in general.
In case you can't derive a proper approach after reading the docs I can recommend the pg-forum. The experienced user 'akretschmer' is a postgresql-pro and might be able to help you if you formulate your question in a detailed and meaningful way ;)
回答2:
The same question as OP had occurred to me, and I couldn't find anything about it. Our requirement is simply 2 data scientists accessing the data, slicing it, exploring, etc. Here is our current setup and configuration
- Data: 5 billion rows (~300GB) of AWS cloudwatch 5-minute data
- Hardware: AWS EC2 t2.2xlarge (8 cores, 32GB RAM, 500GB gp2 disk)
- Postgresql version 10
- Modified sections of
/etc/postgresql/10/main/postgresql.conf
work_mem = 25GB
maintenance_work_mem = 25GB
max_worker_processes = 8
max_parallel_workers = 8
max_parallel_workers_per_gather = 4
I'll be more than happy if someone has further suggestions.
Edit: I posted this as a question on DBA stackexchange, for further suggestions.
来源:https://stackoverflow.com/questions/52775971/specific-postgresql-server-configuration-for-data-analysis-purposes