How do I know which is the more appropriate database design? (Authors, Articles & Comments)

后端未结

关注

 3  1948

心在旅途 2021-01-17 04:53

Let\'s assume a database with three tables: Author, Articles, Comments

Assuming the relationship is as follows:

3条回答

没有蜡笔的小新 (楼主)

2021-01-17 05:51
Your first approach is a normalized design. It should be the default - it's more maintainable, less error-prone, and requires less code overall.

The second option is a denormalized design. If you think it through, it would require you to find the author for the article every time someone posts a comment, and increment the "comments" field; that's probably more code, and makes writing the comment slower. It also means a simple bug in your "create comment" code could break the application logic, and you probably need to create a transaction for each comment "write" action so you can guarantee that both the comment and update to "authors.comment_count" succeeds or fails.

So, the second option is definitely more complex, and slower for writing comments. It may be faster for querying, but as you'll be joining on primary keys, you will almost certainly not be able to measure that performance impact until you get to a database size of hundreds of millions of records.

In general, I recommend the following approach; take each step only if the previous steps haven't given you enough performance.
- design a relational model.
- tune that relational database (indexes, etc.)
- improve the hardware - RAM, CPU, SSD disks etc.
- create a measurement rig so you can identify the performance challenges and run experiments. Create benchmarks based on current and expected data sizes; find a way to fill your test rig with dummy data until you have the data volume you need to scale to.
- run your queries on the test rig. Make sure there are no further performance tweaks from indexing or query optimization.
- introduce application-level caching. In your example, caching the number of comments for an author for 1 hour may be acceptable.
- de-normalize your schema. Use your test rig to prove it gives you the performance you expect.
- look at more exotic data solutions - sharding, data partitioning etc.
Denormalization is so far down the line because it introduces real maintenance risks, makes your code much more complex, and is nowhere near as effective as adding an extra 4GB to your server in most cases.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...