a Large one table with 100 column vs a lot of little tables

后端 未结 3 1728
萌比男神i
萌比男神i 2021-02-19 22:36

I created some website which contain users,comments,videos,photos,messages and more.All of the data is in the one table which contain 100 column.I thought one table is better th

相关标签:
3条回答
  • 2021-02-19 22:53

    For your situation it is better to have multiple tables. The reason for this is because if you put all your data into one table then you will have update anomalies. For example, if a user decides to update his username, you will have to update every single row in your big table that has that user's username. But if you split it into multiple tables then you will only need to update one row in your User table and all the rows in your other tables will reference that updated row.

    As far as speed, having one table will be faster than multiple tables with SELECT statements because joining tables is slow. INSERT statements will be about the same speed in either situation because you will be inserting one row. However, updating someone's username with an UPDATE statement will be very slow with one table if they have a lot of data about them because it has to go through each row and update every one of them as opposed to only having to update one row in the User table.

    So, you should create tables for everything you mentioned in your first sentence (users, comments, videos, photos, and messages) and connect them using Ids like this:

    User
    -Id
    -Username
    
    Video
    -Id
    -UploaderId references User.Id
    -VideoUrl
    
    Photo
    -Id
    -UploaderId references User.Id
    -PhotoUrl
    
    VideoComment
    -CommenterId references User.Id
    -VideoId references Video.Id
    -CommentText
    
    PhotoComment
    -CommenterId reference User.Id
    -PhotoId references Photo.Id
    -CommentText
    
    Message
    -SenderId references User.Id
    -ReceiverId references User.Id
    -MessageText
    
    0 讨论(0)
  • 2021-02-19 22:55

    100 columns in a single table is bad design in most situations.

    Read this page: http://www.tutorialspoint.com/sql/sql-rdbms-concepts.htm

    Break your data up into related chunks and give each of them their own table.

    You said you have this information (users,comments,videos,photos,messages) so you should have something like these tables.

    1. Users which contains (User ID, Name, Email etc)
    2. Comments which contains (Comment ID, User ID, Comment Text etc)
    3. Videos which contains (Video ID, User ID, Comment ID, Video Data etc)
    4. Photos which contains (Photo ID, User ID, Comment ID, Photo Data etc)
    5. Messages which contains (Message ID, User ID, Message Text etc)

    Then when your writing your SQL you can write proper SQL to query based on exactly what information you need.

    SELECT UserID, MessageID, MessageText
    FROM Users as USR
        JOIN Messages as MSG
            on USR.UserID = MSG.UserID
    WHERE USR.UserID = 1234567
    

    With your current query your having to deal with rows containing data that you dont need or care about.

    EDIT Just to give some further information to the OP as to why this is better design.

    Lets take the "Users" as a starting example.

    In a proper database design you would have a table called Users which has all the required columns that are required for a user to exist. Username, email, id number etc.

    Now we want to create a new user so we want to insert Username, email and id number. But wait i still have to populate 97 other columns with totally unrelated information to our process of creating a new user! Even if you store NULL in all columns its going to use some space in the database.

    Also imagine you have hundreds of users all trying to select, update and delete from a single database table. There is a high chance of the table being locked. But if you had one user updating the Users table, another user Inserting into the Messages table then the work is spread out.

    And as other users have said, purely performance. The database needs to get all information and filter out what you want. If you have alot of columns this is unnecessary work.

    Performance Example.

    Lets say your database has been running for years. You have 5000 users, 2,000,000 comments, 300,000 pictures, 1,000,000 messages. Your single table now contains 3,305,000 records.

    Now you want to find a User with the ID of 12345 who has more than 20 pictures. You need to search through all 3,305,000 records to get this result.

    If you had a split table design then you would only need to search through 305,000 records.

    Obvious performance gain!!

    EDIT 2

    Performance TEST.

    I created a dummy table containing 2 million rows and 1 column. I ran the below query which took 120ms on average over 10 executions.

    SELECT MyDate1 from dbo.DummyTable where MyDate1 BETWEEN '2015-02-15 16:59:00.000' and '2015-02-15 16:59:59.000'
    

    I then truncated the table and created 6 more columns and populated them with 2 million rows of test data and ran the same query. It took 210ms on average over 10 executions.

    So adding more columns decreases performance even though your not viewing the extra data.

    0 讨论(0)
  • 2021-02-19 23:02

    Wide tables can cause performance problems if they are wider than the database can store in one place.

    You need to read about normalization as this type of structure is very bad and is not what the database is optimized for. In your case you will have many repeated records that you will have to use distinct (which is a performance killer) to get rid of when you want to only show the user name or the comments.

    Additionally, you may have some fields that are repeats like comment1, comment2, etc. Those are very hard to query over time and if you need another one, then you have to change the table structure and potentially change the queries. That is a bad way to do business.

    Further when you only have one table, it becomes a hot spot in your database and you will have more locking and blocking.

    Now also suppose that one of those pieces of information is updated, now you have to make sure to update all the records not just one. This can also be also a performance killer and if you don't do it, then you will have data integrity problems which will make the data in your database essentially useless. Denormalizing is almost always a bad idea and always is a bad idea when done by someone who is not an expert in database design. There are many ramifications of denormalization that you probably haven't thought of. Overall your strategy is sure loser over time and needs to be fixed ASAP because the more records you have in a database, the harder it is to refactor.

    0 讨论(0)
提交回复
热议问题