Best representation of an ordered list in a database?

荒凉一梦 提交于 2019-12-17 15:37:10

问题


I know that this sort of goes against the principles of a relational database but let me describe the situation.

I have a page where the user will place a number of items.

 ________________
| -Item1         |
| -Item2         |
| -Item3         |
| -Item4         |
|________________|

These items have must stay in a the order the user gives them. However this order may be changed an arbitrary number of times by the user.

 ________________
| -Item1         |
| -Item4         |
| -Item2         |
| -Item3         |
|________________|

Approach 1

My original thought was to give the items an index to represent thier place in the list

Page           Item
-----------    ---------------
FK | pid       FK | pid 
   | name      PK | iid 
                  | index
                  | content 

With this solution you can select items where pid = Page.pid and order by index which is convenient. However every time you change the order you have to change anywhere between one other item (best case) and all the other items (worst case).

Approach 2

I also considered making a "linked list" like data structure where each item points to the next item in the list.

Page           Item
-----------    ---------------
FK | pid       FK | pid 
   | name      PK | iid 
                  | next
                  | content 

This potentially makes changing the order less expensive but we would have to rely on front end programming to extract the order.

Is there an approach that I haven't thought of? Please let me know.


回答1:


I think @a1ex07 is on the right track here (+1). I don't think gaps in itemOrder violate 3NF, but I do worry about a different violation of 3NF (more on this below). We also have to watch out for bad data in the itemOrder field. Here's how I'd start:

create table pages (
  pid int,
  primary key (pid)
);

create table users (
  uid int,
  primary key (uid)
);

create table items (
  iid int,
  primary key (iid)
);

create table details (
  pid int not null references pages(pid),
  uid int not null references users(uid),
  iid int not null references items(iid), 
  itemOrder int,
  primary key (pid, uid, iid),
  unique (pid, uid, itemOrder)
);

The primary key ensures that for each page, for each user, there are unique items. The unique constraint ensures that for each page, for each user, there are unique itemOrders. Here's my worry about 3NF: in this scenario, itemOrder is not fully dependent on the primary key; it depends only on the (pid, uid) parts. That's not even 2NF; and that's a problem. We could include itemOrder in the primary key, but then I worry that it might not be minimal, as PKs need to be. We might need to decompose this into more tables. Still thinking . . .


[ EDIT - More thinking on the topic . . . ]

Assumptions

  1. There are users.

  2. There are pages.

  3. There are items.

  4. (page, user) identifies a SET of items.

  5. (page, user) identifies an ordered LIST of slots in which we can store items if we like.

  6. We do not wish to have duplicate items in a (page,user)'s list.

Plan A

Kill the details table, above.

Add a table, ItemsByPageAndUser, to represent the SET of items identified by (page, user).

create table ItemsByPageAndUser (
   pid int not null references pages(pid),
   uid int not null references users(uid),
   iid int not null references items(iid),
  primary key (pid, uid, iid)   
)

Add table, SlotsByPageAndUser, to represent the ordered LIST of slots that might contain items.

create table SlotsByPageAndUser (
   pid       int not null references pages(pid),
   uid       int not null references users(uid),
   slotNum   int not null,
   iidInSlot int          references items(iid),
 primary key (pid, uid, slotNum),   
 foreign key (pid, uid, iid) references ItemsByPageAndUser(pid, uid, iid),
 unique (pid, uid, iid)
)

Note 1: iidInSlot is nullable so that we can have empty slots if we want to. But if there is an item present it has to be checked against the items table.

Note 2: We need the last FK to ensure that we don't add any items that are not in the set of possible items for this (user,page).

Note 3: The unique constraint on (pid, uid, iid) enforces our design goal of having unique items in the list (assumption 6). Without this we could add as many items from the set identified by (page,user) as we like so long as they are in different slots.

Now we have nicely decoupled the items from their slots while preserving their common dependence on (page, user).

This design is certainly in 3NF and might be in BCNF, though I worry about SlotsByPageAndUser in that regard.

The problem is that because of the unique constraint in table SlotsByPageAndUser the cardinality of the relationship between SlotsByPageAndUser and ItemsByPageAndUser is one-to-one. In general, 1-1 relationships that are not entity subtypes are wrong. There are exceptions, of course, and maybe this is one. But maybe there's an even better way . . .

Plan B

  1. Kill the SlotsByPageAndUser table.

  2. Add a slotNum column to ItemsByPageAndUser.

  3. Add a unique constraint on (pid, uid, iid) to ItemsByPageAndUser.

Now it's:

create table ItemsByPageAndUser (
   pid     int not null references pages(pid),
   uid     int not null references users(uid),
   iid     int not null references items(iid),
   slotNum int,
 primary key (pid, uid, iid),   
 unique (pid, uid, slotNum)
)

Note 4: Leaving slotNum nullable preserves our ability to specify items in the set that are not in the list. But . . .

Note 5: Putting a unique constraint on a expression involving a nullable column might cause "interesting" results in some databases. I think it will work as we intend it to in Postgres. (See this discussion here on SO.) For other databases, your mileage may vary.

Now there is no messy 1-1 relationship hanging around, so that's better. It's still 3NF as the only non-key attribute (slotNum) depends on the key, the whole key, and nothing but the key. (You can't ask about slotNum without telling me what page, user, and item you are talking about.)

It's not BCNF because [ (pid, uid, iid) -> slotNum ] and [(pid,uid,slotNum) -> iid ]. But that's why we have the unique constraint on (pid, uid, slotNum) which prevents the data from getting into an inconsistent state.

I think this is a workable solution.




回答2:


Solution: make index a string (because strings, in essence, have infinite "arbitrary precision"). Or if you use an int, increment index by 100 instead of 1.

The performance problem is this: there is no "in between" values between two sorted items.

item      index
-----------------
gizmo     1
              <<------ Oh no! no room between 1 and 2.
                       This requires incrementing _every_ item after it
gadget    2
gear      3
toolkit   4
box       5

Instead, do like this (better solution below):

item      index
-----------------
gizmo     100
              <<------ Sweet :). I can re-order 99 (!) items here
                       without having to change anything else
gadget    200
gear      300
toolkit   400
box       500

Even better: here is how Jira solves this problem. Their "rank" (what you call index) is a string value that allows a ton of breathing room in between ranked items.

Here is a real example of a jira database I work with

   id    | jira_rank
---------+------------
 AP-2405 | 0|hzztxk:
 ES-213  | 0|hzztxs:
 AP-2660 | 0|hzztzc:
 AP-2688 | 0|hzztzk:
 AP-2643 | 0|hzztzs:
 AP-2208 | 0|hzztzw:
 AP-2700 | 0|hzztzy:
 AP-2702 | 0|hzztzz:
 AP-2411 | 0|hzztzz:i
 AP-2440 | 0|hzztzz:r

Notice this example hzztzz:i. The advantage of a string rank is that you run out of room between two items, you still don't have to re-rank anything else. You just start appending more characters to the string to narrow down focus.




回答3:


You could add a new character (nvarchar) column to the Page table called order that contains a delimited list of iid's in the order you prefer, i.e. 1,4,3,2. The advantage is just one field in one table to maintain - the obvious disadvantage would be the need to write a utility function(s) to convert between the character and numeric types which in reality probably wouldn't take too long.




回答4:


If you expect number of items is not huge, you can use a bit modified version of your first approach. Just make gap between consecutive indexes. For example, first item has index 100, second 200, etc. This way you don't have to update all indexes every time, only if you cannot find a gap




回答5:


Use the Approach 1 and live with the performance implications of index updates. Unless you are dealing with millions of items per page, you are unlikely to find the performance lacking, and you retain all the power of SQL in dealing with sets of data.

In addition to being much harder to work with from the pure non-procedural SQL, the Approach 2 would still require you to traverse the list to find the right place to reconnect the "links" when reordering the item.



来源:https://stackoverflow.com/questions/9536262/best-representation-of-an-ordered-list-in-a-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!