SQL design approach for searching a table with an unlimited number of bit fields

前端未结

关注

 5  2170

失恋的感觉 2020-12-31 23:01

Consider searching a table that contains Apartment Rental Information: A client using the interface selects a number of criteria that are represented as bit fields in the D

5条回答

南方客 (楼主)

2020-12-31 23:44
I've walked down this path a few times trying to store health status markers!

When I first started (in 2000?) I tried a character position approach (your #2) and found that it quickly became pretty unwieldy as I wrestled with the same questions over and over: "which position held 'Allows Pets' again?" or, worse yet, "how long is this string now? / which position am I on?" Can you work around this problem - developing objects to manage things for you? Well, yes, to an extent. But I really didn't appreciate how much extra work it cost compared to having the field identities managed for me by the database.

The second time around, I used an attribute/value pair approach similar to your solution #3. This basically worked and, for specialty needs, I still generate attribute/value pairs using a PIVOT. Also, my background is in AI and we used attribute/value pairs all the time in mechanical theorem proving so this was very natural for me.

However, there is a huge problem with this approach: pulling any one fact out ("Show me the apartments that allow pets") is easy but pulling all of the records meeting multiple constraints quickly gets very, very ugly (see my example below).

**SO...**I ended up adding fields to a table. I understand the theoretical reasons that Jon and 'Unknown' and 'New In Town' give for preferring other approaches and I'd have agreed with either or both at one point. But experience is a pretty harsh teacher...

A Couple More Things

First, I disagree that adding more bit fields is a nightmare of maintenance - at least compared with a character-bit approach (your #2). That is, having a distinct field for each attribute ensures that there is no 'management' necessary to figure out which slot belongs to which attribute.

Second, having 300 fields isn't really the problem - any decent database can do that without problem.

Third, your real issue and the source of pain is really the matter of dynamically generating your queries. If you are like me, this question is really all about "Do I really have to have this massive, grody and inelegant chain of "IF" statements to construct a query?"

The answer, unfortunately, is Yes. All three of the approaches you suggest will still boil down to a chain of IF statements.

In a database bit-field approach, you'll end up with a series of IF statements where all of your columns have to be added like so:
```
string SQL = "Select X,Y,Z Where ";

if (AllowsPets == 0)
  SQL += "(AllowsPets = 0) AND ";
else if (AllowsPets == )
  SQL += "(AllowsPets = 1) AND ";  // Else AllowsPets not in query
.
.
.
SQL = SQL.Substring(SQL.Length - 4);  // Get rid of trailing 'AND' / alternatively append '(1=1)'
```
In a character-position approach, you'll do the same thing but your "Appends" will add "0", "1" or "_" to your SQL. You'll also, of course, run into the maintenance issues deciding which one is which that I discussed above (enums help but don't completely solve the problem).

As mentioned above, the Attribute-Value approach is actually the worst. You'll have to either create a nasty chain of sub-queries (which surely will cause a stack overflow of some sort with 300 clauses) or you need to have an IF-THEN like this:
```
// Kill any previously stored selections.
SQLObject.Execute("Delete From SelectedApts Where SessionKey=X");
// Start with your first *known* attr/value and fill a table with the results.
.
.
Logic to pick first known attr/value pair
.
.
SQLObject.Execute("Insert Into SelectedApts Select X as SessionKey, AptID From AttrValue Where AllowsPets=1");

// Now you have the widest set that meets your criteria. Time to whittle it down.
if (HasParking == 1)
  SQLObject.Execute("Delete From SelectedApts Where AptID not in (Select AptID From AttrValue Where AllowsChildren=1));
if (AllowsChildren == 0)
  SQLObject.Execute("Delete From SelectedApts Where AptID not in (Select AptID From AttrValue Where AllowsChildren=0));
.
.
.
// Perform 2-300 more queries to keep whittling down your set to the actual match.
```
Now, you may be able to optimize this a bit so you run fewer queries (a PIVOT, sets of subqueries or using the UNION operator) but the fact is that this gets VERY expensive compared to the single query that you can use (but have to build) using the other approaches.

Thus, this is a painful kind of problem no matter what approach you take - there really is no magic that helps you to avoid it. But, having been there before, I would absolutely recommend approach #1.

Update: If you are really focused on pulling straight criteria matches ("All Apartments That Have A, B and C") and don't need other queries (like "...Sum(AllowsPets), Sum(AllowsChildren)..." or "...(AllowsPets=1) OR (AllowsChildren=1)...") then I really like KM's answer the more I look at it. It is very clever and looks likely to be acceptably fast.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...