Database - Designing an “Events” Table

烂漫一生 提交于 2019-11-28 16:24:21

I highly recommend this approach. Since you're presumably using the same database for OLTP and OLAP, you can gain significant performance benefits by adding in some stars and snowflakes.

I have a social networking app that is currently at 65 tables. I maintain a single table to track object (blog/post, forum/thread, gallery/album/image, etc) views, another for object recommends, and a third table to summarize insert/update activity in a dozen other tables.

One thing that I do slightly differently is to maintain an entity_type table, and use its ID in the object_type column (in your case, the 'TABLE' column). You would want to do the same thing with an event_type table.

Clarifying for Alix - Yes, you maintain a reference table for objects, and a reference table for events (these would be your dimension tables). Your fact table would have the following fields:

id
object_id
event_id
event_time
ip_address

It looks like a pretty reasonable design, so I just wanted to challenge a few of your assumptions to make sure you had concrete reasons for what you're doing.

In my fully normalized database this adds up to about 8 to 10 additional tables

These are all statistics derived from existing data, aren't they? (Update: okay, they're not, so disregard following.) Why wouldn't these simply be views, or even materialized views?

It may seem like a slow operation to gather those statistics, however:

  • proper indexing can make it quite fast
  • it's not a common operation, so the speed doesn't matter all that much
  • eliminating redundant data might make other common operations fast and reliable

I've come up with a table schema that would separate highly volatile data from other tables subjected to heavy reads

I guess you're talking about how the user (just to pick one table) events, which would be pretty volatile, are separated from the user data. I agree that it ought to be separate, but more because it's fundamentally different data. What someone is and what someone does are two different things.

I don't think volatility is so important. The DBMS should already allow you to put the log file and database file on separate devices, which accomplishes the same thing, and contention shouldn't be an issue with row-level locking.

Non-relational (still not as bad as EAV)

I think you're missing the forest for the trees, so to speak.

The predicate for your table would be "User ID from IP IP at time DATE EVENTed to TABLE" which seems reasonable, but there are issues. (Update: Okay, so it's sort of kinda like that.)

You can still join, say, user events to users, but you can't implement a foreign key constraint. That's why EAV is generally problematic; whether or not something is exactly EAV doesn't really matter. It's generally one or two lines of code to implement a constraint in your schema, but in your app it could be dozens of lines of code, and if the same data is accessed in multiple places by multiple apps, it can easily multiply to thousands of lines of code. So, generally, if you can prevent bad data with a foreign key constraint, you're guaranteed that no app will do that.

You might think that events aren't so important, but, as an example, ad impressions are money. I would definitely want to catch any bugs relating to ad impressions as early in the design process as possible.

Further comment

I can spot some caveats but as long as the app messing with the database knows what it is doing I guess there shouldn't be any problems.

And with some caveats you can make a very successful system. With a proper system of constraints, you get to say, "if any app messing with the database doesn't know what it's doing, the DBMS will flag an error." That may require a more time and money than you've got, so something simpler that you can have is probably better than something more perfect that you can't. C'est la vie.

I can't add a comment to Ben's answer, so two things...

First, it would be one thing to use views in a standalone OLAP/DSS database; it's quite another to use them in your transaction database. The High Performance MySQL people recommend against using views where performance matters

WRT data integrity, I agree, and that's another advantage to using a star or snowflake with 'events' as the central fact table (as well as using multiple event tables, like I do). But you cannot design a referential integrity scheme around IP addresses

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!