Design Pattern for Custom Fields in Relational Database

前端 未结 4 629
礼貌的吻别
礼貌的吻别 2020-12-08 17:16

I have assigned a task to create (relatively) simple reporting system. In these system, user will be shown a table result of report. A table has some fields and each field g

4条回答
  •  感动是毒
    2020-12-08 17:55

    Your design is a variation of the Entity Attribute Value (EAV) data model, which is often regarded as an anti-pattern in database design.

    Maybe a better approach for you would be to create a reporting values table with, say, 300 columns (NUMBER_VALUE_1 through NUMBER_VALUE_100, VARCHAR2_VALUE_1..100, and DATE_VALUE_1..100).

    Then, design the rest of your data model around tracking which reports use which columns and what they use each column for.

    This has two benefits: first, you are not storing dates and numbers in strings (the benefits of which have already been pointed out), and second, you avoid many of the performance and data integrity issues associated with the EAV model.

    EDIT -- adding some empirical results of an EAV model

    Using an Oracle 11g2 database, I moved 30,000 records from one table into an EAV data model. I then queried the model to get those 30,000 records back.

    SELECT SUM (header_id * LENGTH (ordered_item) * (SYSDATE - schedule_ship_date))
    FROM   (SELECT rf.report_type_id,
                   rv.report_header_id,
                   rv.report_record_id,
                   MAX (DECODE (rf.report_field_name, 'HEADER_ID', rv.number_value, NULL)) header_id,
                   MAX (DECODE (rf.report_field_name, 'LINE_ID', rv.number_value, NULL)) line_id,
                   MAX (DECODE (rf.report_field_name, 'ORDERED_ITEM', rv.char_value, NULL)) ordered_item,
                   MAX (DECODE (rf.report_field_name, 'SCHEDULE_SHIP_DATE', rv.date_value, NULL)) schedule_ship_date
            FROM   eav_report_record_values rv INNER JOIN eav_report_fields rf ON rf.report_field_id = rv.report_field_id
            WHERE  rv.report_header_id = 20 
            GROUP BY rf.report_type_id, rv.report_header_id, rv.report_record_id)
    

    The results were:

    1 row selected.
    
    Elapsed: 00:00:22.62
    
    Execution Plan
    ----------------------------------------------------------
    
    ----------------------------------------------------------------------------------------------------
    | Id  | Operation                       | Name                        | Rows  | Bytes | Cost (%CPU)|
    ----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                |                             |     1 |  2026 |    53  (67)|
    |   1 |  SORT AGGREGATE                 |                             |     1 |  2026 |            |
    |   2 |   VIEW                          |                             |   130K|   251M|    53  (67)|
    |   3 |    HASH GROUP BY                |                             |   130K|   261M|    53  (67)|
    |   4 |     NESTED LOOPS                |                             |       |       |            |
    |   5 |      NESTED LOOPS               |                             |   130K|   261M|    36  (50)|
    |   6 |       TABLE ACCESS FULL         | EAV_REPORT_FIELDS           |   350 | 15050 |    18   (0)|
    |*  7 |       INDEX RANGE SCAN          | EAV_REPORT_RECORD_VALUES_N1 |   130K|       |     0   (0)|
    |*  8 |      TABLE ACCESS BY INDEX ROWID| EAV_REPORT_RECORD_VALUES    |   372 |   749K|     0   (0)|
    ----------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       7 - access("RV"."REPORT_HEADER_ID"=20)
       8 - filter("RF"."REPORT_FIELD_ID"="RV"."REPORT_FIELD_ID")
    
    Note
    -----
       - 'PLAN_TABLE' is old version
    
    
    Statistics
    ----------------------------------------------------------
              4  recursive calls
              0  db block gets
         275480  consistent gets
            465  physical reads
              0  redo size
            307  bytes sent via SQL*Net to client
            252  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
              1  rows processed
    

    That's 22 seconds to get 30,000 rows of 4 columns each. That is way too long. From a flat table we'd be looking at under 2 seconds, easy.

提交回复
热议问题