Database Tables, more the better?

谁说胖子不能爱 提交于 2019-12-01 00:16:23

The issue here is subtyping. There are three basic approaches to dealing with subtypes.

  1. Put each record type into a completely separate table;
  2. Put a record in a parent table and then a record in a subtype table; and
  3. Put all the records in one table, having nullable columns for the "optional" data (ie things that don't apply to that type).

Each strategy has its merits.

For example, (3) is particularly applicable if there is little to no difference between different subtypes. In your case, do different log records have extra columns if they're of a particular type? If they don't or there are few cases when they do putting them all in one table makes perfect sense.

(2) is common used for a Party table. This is a common model in CRMs that involves a parent Party object which has subtypes for Person and Organization (Organization may also have subtypes like Company, Association, etc). Person and Organization have different properties (eg salutation, given names, date of birth, etc for Person) so it makes sense to split this up rather than using nullable columns.

(2) is potentially more space efficient (although the overhead of NULL columns in modern DBMSs is very low). The bigger issue is that (2) might be more confusing to developers. You will get a situation where someone needs to store an extra field somewhere and will whack it in a column that's empty for that type simply because it's easier doing that than getting approval for the DBAs to add a column (no, I'm not kidding).

(1) is probably the least frequently used scheme of the 3 in my experience.

Lastly, scalability has to be considered and is probably the best case for (1). At a certain points JOINs don't scale effectively and you'll need to use some kind of partitioning scheme to cut down your table sizes. (1) is one method of doing that (but a crude method).

I wouldn't worry too much about that though. You'll typically need to get to hundreds of millions or billions of records before that becomes an issue (unless your records are really really large, in which case it'll happen sooner).

It depends. If you're going to have 1500000000 entries of type 1 and 1000 entries of type 2 and you'll be doing a LOT of queries on type 2, separate the tables. If not, it's more convenient to keep only one table.

Keep in mind scalability:

  • How many entries of each type will I have in 1 year?

  • How many requests on this table will I be doing ?

  • Can you, at some point, clear this log? Can you move it to another table (like archive entries older than X months) ?

The one drawback I see right now is that you cannot enforce foreign key integrity on your type_id since it points to many different tables.

I want to add a small tip. A little off topic, and quite basic, but it's a lot clearer to use enum instead of tinyint for status flags, i.e.

   enum('user','type')

If there are only two statuses, tinyint is a little more memory efficient, but less clear. Another disadvantage in enum is that you put a part of the business logic in the data tier - when you need to add or remove statuses, you have to alter the DB. Otherwise it's much more clear and I prefer enum.

I would keep things as specific as possible - in this case I would create two tables. Each table has a specific purpose so I cannot see why you would combine them.

I wouldn't do what vBulletin does. The problem with older apps like vBulletin is that while they might have started as lean-machines, over the time they collect a lot of entropy and end up being bloated. Since there are plugins, and third-party tools, and developers who've worked on the old code, breaking it is a tough choice.

That's why there is not much refactoring going on here. Don't make them your programming model. Look around, find out what works best and use that. A lot of table sounds like a bad thing to me, not good.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!