How to store meta-data on columns

前端未结

关注

 9  592

Let\'s say you\'re collecting insider info on upcoming superhero movie releases and your main Movie table looks something like this:

Table 1

相关标签:

9条回答

清歌不尽

2020-12-28 09:58

My response may seem a bit too philosophical for SO. Bear with me.

I think that the "Source" column isn't subject matter data, but rather meta-data. It's really data about how we come to know some other bit of data. That makes it data about data, and that's meta-data.

Among the reasons why EAV causes the problems that it does is the fact that it intermixes data and metadata in a single row. There are times when I've deliberately done that myself, as an intermediate step towards a result I want to acheive. But I've tried never to intermix data and metadata in my deliverables.

I know why I never did that, but I can't explain it concisely.

0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-12-28 09:58

Since no one else is really taking a crack at it, I'm going to answer my own question. I'm pretty sure an EAV-like table is indeed the only way to go. To store metadata on each column (regarding the source and journalist in this case), you're really treating each column as an entity in itself, which is what an EAV allows.

You could go other routes, like adding a second and third column for each original column to store data, but that is definitely breaking some fundamental normalization rules and will probably only cause you pain later.

0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-12-28 09:59

Another approach to consider is Class Table Inheritance. Bill Karwin has a great review of EAV options in this SO answer, and lots of good context.

0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2020-12-28 10:14

Interesting scenario. You could get around the EAV ghetto-ness by thinking about your entities as first class objects; let's call them Facts. And it helps that you're pretty orthogonal in this case, in that every movie has the exact same four facts. Your EAV table can be your pristine/correct table, and then you can have an outside process that mines that table and replicates the data into a properly normalized form (i.e. your first table). This way you have the data you want, with its meta data, and, you have an easy way to query for movie information, accurate to how often your mining process runs.

I think you definitely need some "out-of-database" muscle to make sure the data remains valid, since there doesn't seem to be any in-database way of maintaining integrity across your regular and EAV tables. I guess with a complex series of triggers you can pretty much accomplish anything, but one human administrator who "gets" your problem is probably much easier to handle.

0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-28 10:19

I would make my decision based on what I need to code.

If src/journo is simply additional info, I would go for further columns. But if I know I'm going to end up building complicated src/journo queries, I would go EAV, as it'll be easier to search for a journalist's references down the meta table than having to go into LeadingFemaleJournalist and VillainJournalist etc.

Personally - I would be inclined dump the src/journo meta-data into another table EAV-style, but use a FK to define an Attribute definition table. Having a freeform Attribute text field is a recipe for disaster - always control your attributes through a constraint. Triggers could be implemented to improve referential integrity if required.

For me, it comes down to point-of-view. Do you see sources and journalists being relational concerns in their own right or are they just additional pieces of data to complement a Movie? The next level of refinement would be to create different tables for MovieDataSource and MovieDataJournalist which could allow you to map FKs to a tables defining valid Sources and Journalists (and further information on those Sources/Journalists could then be fleshed out). What you will have done here is to establish a many-to-many relationship between the Movie entity and the Source (and also Journalist) entity.

0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2020-12-28 10:20
Your can change what you consider a fact value in your design ... it seems that a fact in your data model could be expressed as the following N-tuple:
```
Movie | FactType | FactValue | FactSource | FactJournalist
```
The following table structures should support the data model you want, and can relatively easily be indexed and joined. You can also create a view that pivots out just the fact value and fact type so that you can create the following perspective:
```
MovieID | Movie Name | Director | LeadingMale | LeadingFemale | PrimaryVillain | etc
```
Interestingly, you could consider this to be the logical extension of fully applying an EAV model to the data, and decomposing an individual movie (with it's intuitive attribution of director, lead, villain, etc) into a pivoted structure where attributes focus on the source of the information instead.

The benefits of the proposed data model are:
- it is well-normalized (though you should probably normalize the FactType field into a reference table for completeness)
- it is possible to create a view that pivots fact types efficiently out into a tabular structure
- it is relatively extensible and allows the database to enforce referential integrity and (if desired) cardinality constraints
- the MovieFact table can be subclassed to support different kinds of movie facts, not just those that are simple text field
- simple queries against the data are relatively efficient
Some of the disadvantages of the data model are:
- Composite, conditional queries are harder (but not impossible) to write (e.g. find all movies where Director is A and Leading Male is B, etc...)
- The model is somewhat less obvious than the more traditional approach, or one involving EAV structures
- inserts and updates are a little trickier because updating multiple facts requires updating multiple rows, not multiple columns
I've the Movie data up a level to normalize the structure, and you could pushed the movie name down into the MovieFact structure for consistency (since for some movies I can imagine even then name is something you may want to track source information for).
```
Table Movie
========================
MovieID   NUMBER, PrimaryKey
MovieName VARCHAR

Table MovieFact
========================
MovieID          NUMBER,  PrimaryKeyCol1
FactType         VARCHAR, PrimaryKeyCol2
FactValue        VARCHAR
FactSource       VARCHAR
FactJournalist   VARCHAR
```
Your fictional movie data would then look like the following:
```
Movie Table
====================================================================================
MovieID  MovieName
====================================================================================
1        Green Lantern
2        The Tick

MovieFact Table
====================================================================================
MovieID  FactType       FactValue         FactSource       FactJournalist
====================================================================================
1        Director       Kubrick           CHUD             Sarah
1        Leading Male   Robert Redford    CHUD             James
1        Leading Female Miley Cyrus       Dark Horizons    James
1        Villain        Hugh Grant        CHUD             Sarah
2        Director       Mel Gibson        Yahoo            Cameron
2        Leading Male   John Lambert      Yahoo            Erica
...
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页