Speed: Store aggregate values in database or calculate with Jinja?

我是研究僧i 提交于 2019-12-11 03:24:23

问题


Language: Python

Database: SQLite

Using: Flask, SQLAlchemy ORM


My question itself is probably an overkill, but I'm very curious.

I have columns in an SQLAlchemy Table that hold certain values that I need to perform mathematical operations on, to display aggregate values or calculated values.

Let's assume:

Column 1: 0

Column 2: 5

Column 3: 2

Column 4: 6

In an HTML table, I need to rely on those values to calculate and display a result of an arithmetic operation on them.

Example: ( Column 1 + Column 2 + Column 3 / Column 6 ) * 100

Do I calculate those numbers and store them in a new column in my SQLite database (using SQLAlchemy), or calculate them on the fly using Jinja2?


回答1:


There is really no single correction solution for any optimization problem. You will have to find out the optimal solution through testing. Your case is an optimization for time (speed), hence we should look at memory (the trade off) or how data is persisted and accessed. Here are the layers where your data passes through:

Disk -> SQLite Driver -> Python SQLite DBAPI -> SQLAlchemy -> Jinja

Excluding the disk (since your choice of database doesn't really have tricks in dealing with physical storage optimizations - it's a single file after all) and excluding the DBAPI layer (it's good as integrated with SQLAlchemy and you don't have much of a choice between DBAPI drivers for SQLite), here are the possible ways for you to calculate a column in each layer:

  1. SQLite Driver - You can create a view in SQLite for the calculated column

    • A view is seen by the upper layers as if it's a table
    • Can change the upper layers and yet maintain the same definition
    • Cannot modify the calculation dynamically without resorting to dropping and recreating the view
    • Cannot memoize the calculation yet
    • Views are read-only - bit pointless to build an ORM wrapper around it
    CREATE VIEW view_name (
        column_1,
        column_2,
        column_3_you_can_rename_columns_here,
        column_6,
        column_X)
    AS SELECT
        column_1,
        column_2,
        column_3,
        column_6,
        (column_1 + column_2 + column_3 / column_6) * 100.0
    FROM table_name
    
  2. SQAlchemy - A calculated column can be added to your Table class definition

    • Option available to persist/save the calculated value as an actual column in the database
    • Can dynamically change the calculation as it's in the Python layer
    • Can memoize the calculation
    • For a persisted calculated column, see: https://stackoverflow.com/a/4284191/1027422
    • For a simple Python-only (not saved to DB) calculated column, see: http://docs.sqlalchemy.org/en/latest/orm/mapped_sql_expr.html
    • How to cache or memoize calculations, see lru_cache section of https://docs.python.org/3/library/functools.html
  3. Jinja - Calculations can also be done in Jinja

    • Calculations done at this layer is not readily passed to previous layers - difficult to persist to database
    • May not be the most efficient

From experience, you will often get best results by pre-calculating at the database level since calculations are done as data is fetched from the disk to memory in one pass. However, your choice of database limits your option to mostly doing the optimizations at Python level. You need to test which approach is optimal for your use case by using timeit.

Memoization may not be of help to you unless your data (input columns) have frequently repeating values. Do be aware though that premature optimization is the root of all evil.



来源:https://stackoverflow.com/questions/41086949/speed-store-aggregate-values-in-database-or-calculate-with-jinja

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!