Database design for answer table in quiz app

问题

I am designing a quiz app and i am stuck on how to design the answers table.

Assume i have the following tables:

User(user_id,...other columns)
Question(question_id,user_id,...other columns)
QuestionAnswers(question_id,answer_id... other columns)

Now what to do about the UserAnswers table? The structure I have in mind is this:

UserAnswers(user_id,question_id,answer_id,.. other columns)

The structure that I have made works great at start, but the performance starts degrading once I reach 10 million rows. Considering my app, if 10,000 questions are present, and there are 1000 users in the system and each user answers each of the question. I will easily get to 10 million rows, and as users and questions grow the table size will grow dramatically.

What is a better approach to store these answers?

Moreover, I designed the system in MySQL. Do you think the same table structure would work better in some other DBMS?

mysql> explain select count(*) from user_answer where question_id = 9845;
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
| id | select_type | table       | partitions | type | possible_keys | key         | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | user_answer | NULL       | ref  | question_id   | question_id | 4       | const |  645 |   100.00 | Using index |
+----+-------------+-------------+------------+------+---------------+-------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)


mysql> explain select count(*) from user_answer;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)


mysql> select count(*) from user_answer;
+----------+
| count(*) |
+----------+
| 20042126 |
+----------+
1 row in set (11 min 30.33 sec)

回答1:

A general indexing concept is what is left-most in a key. Let's take for example the following key (whether or not it is primary is not the focus here)

key(a,b,c)

For queries such as

select region from myTable where c='Turkey'

the above key is not used. You may endure a tablescan.

For queries such as

select region from myTable where a=17 and c='Turkey'

The key is used up to the most left-most part used, which is a, as b is not in the query. So the key is useful, but not entirely useful. Meaning, at least it quickly gets you to the segmented a rows, but performs a where from there.

Let me say the above another way: In that query, it does not fully use the index to get to c. It knows b is not in the mix of the query, and does not magically jump over b to get to c while fully using the index. But at least the index is partially used.

This is why on thin index width such as ints, and with composites, I often create the second composite index "going the other way", as shown in this answer for Junction tables:

unique key(studentId,courseId,term), -- no duplicates allowed for the combo (note student can re-take it next term)
key (courseId,studentId),

ignore term in there for this discussion. Point being, those are thin ints (relatively low overhead). The second key will require overhead. So it comes at a cost, a cost I am willing to pay. But for queries going in the other direction, I am covered. Meaning, queries involving courseId without the studentId.

Note, my composite flip in the above is not a stellar one. It has often been pointed out to me that having it as shown causes unnecessary overhead. In particular, for the 2nd key, it should just be on courseId (a non-composite). If on the first key, for whatever reason I had term wedged into 2nd place, then it would be a valid example.

A better example would be

key (a,b,c),
key (c,b)

The above, among other things, would be useful for queries going against just c, and also b and c together. But not just b.

The Takeaway:

Resist the urge to splatter new indexes into your schema foolishly thinking they will be used. Especially for non-left-most columns not picked up in actual and frequent queries. And certainly not for those just mentioned and wider columns like varchar(100) times several flips in ordering in multiple indexes. All they do potentially is slow down the inserts and updates and offer, many times, zero performance gains in actual queries. So scrutinize it all.

All index choices come at a cost. Only you should make that determination for what is right for your system.

来源：https://stackoverflow.com/questions/38205044/database-design-for-answer-table-in-quiz-app

标签

php

mysql

database

postgresql

database-design