How to compare 2 successive row values in a resultset object using python

帅比萌擦擦* 提交于 2021-01-02 04:11:25

问题


I have a table issue_logs:

 id | issue_id | from_status | to_status |             up_date              |  remarks  
----+----------+-------------+-----------+----------------------------------+-----------
 29 |       20 |          10 |        11 | 2018-09-14 11:43:13.907052+05:30 | UPDATED
 28 |       20 |           9 |        10 | 2018-09-14 11:42:59.612728+05:30 | UPDATED
 27 |       20 |             |         9 | 2018-09-11 17:45:35.13891+05:30  | NEW issue
 26 |       19 |           9 |        11 | 2018-09-06 16:37:05.935588+05:30 | UPDATED
 25 |       19 |             |         9 | 2018-09-06 16:27:40.543001+05:30 | NEW issue
 24 |       18 |          11 |        10 | 2018-09-05 17:13:37.568762+05:30 | UPDATED

and rt_status:

 id |   description    | duration_in_min 
----+------------------+-----------------
  1 | new              |               1
  2 | working          |               1
  3 | approval pending |               1
  4 | resolved         |               1
  5 | initial check    |               1
  6 | parts purchase   |               1
  7 | shipment         |               1
  8 | close            |               1
  9 | initial check    |               1
 10 | parts purchase   |               1
 11 | shipment         |               1
 12 | close            |               1

For a date range from_datetime = '2018-09-06T16:34' to to_datetime = '2018-09-14T12:27' I want to select all the issues that have exceeded the duration_of_time set for each status value defined in the rt_status table. I should get from issue logs the records with ids 29, 27, and 26. The records with ids 29, and 26 should consider the time elapsed between their last up_date and to_datetime.

I would like to use the func.lag and over to do it, but I'm unable to get the correct records. I am using Postgresql 9.6, and Python 2.7. How exactly can I get the func.lag or func.lead to work using SQLAlchemy Core only?

What I tried:

    s = select([
            rt_issues.c.id.label('rtissue_id'),
            rt_issues,
            rt_status.c.duration_in_min,
            rt_status.c.id.label('stage_id'),
            issue_status_logs.c.id.label('issue_log_id'),
            issue_status_logs.c.up_date.label('iss_log_update'),
            (issue_status_logs.c.up_date - func.lag(
                    issue_status_logs.c.up_date).over(
                    issue_status_logs.c.issue_id
                    )).label('mdiff'),
            ]).\
    where(and_(*conditions)).\
    select_from(rt_issues.
    outerjoin(issue_status_logs,
              rt_issues.c.id == issue_status_logs.c.issue_id).
    outerjoin(rt_status,
              issue_status_logs.c.to_status == rt_status.c.id)).\
    order_by(asc(issue_status_logs.c.up_date),
                  issue_status_logs.c.issue_id).\
    group_by(
             issue_status_logs.c.issue_id,
             rt_issues.c.id,
             issue_status_logs.c.id
             )
    rs = g.conn.execute(s)
    mcnt =  rs.rowcount
    print mcnt, 'rowcont'
    if rs.rowcount > 0:
        for r in rs:
            print dict(r)

This yields results that include wrong records, i.e. issue log with id 28. Can anyone help with rectifying the error?


回答1:


Though you yourself managed to solve your question, here's one take on it that does not use window functions, namely lag() or lead(). In order to compare differences between the up_date timestamps of consecutive issue logs you could self left join. In SQL the query could look like

select    ilx.id
from      issue_logs ilx
join      rt_status rsx on rsx.id = ilx.to_status
left join issue_logs ily on  ily.from_status = ilx.to_status
                         and ily.issue_id = ilx.issue_id
where     ilx.up_date >= '2018-09-06T16:34'
and       ilx.up_date <= ( coalesce(ily.up_date, '2018-09-14T12:27') -
                           interval '1 minute' * rsx.duration_in_min );

and the same in SQLAlchemy SQL Expression Language:

from_datetime = '2018-09-06T16:34'
to_datetime = '2018-09-14T12:27'

ilx = issue_status_logs.alias()
ily = issue_status_logs.alias()
rsx = rt_status

query = select([ilx.c.id]).\
    select_from(
        ilx.
        join(rsx, rsx.c.id == ilx.c.to_status).
        outerjoin(ily, and_(ily.c.from_status == ilx.c.to_status,
                            ily.c.issue_id == ilx.c.issue_id))).\
    where(and_(ilx.c.up_date >= from_datetime,
               ilx.c.up_date <= (func.coalesce(ily.c.up_date, to_datetime) -
                                 cast('1 minute', Interval) *
                                 rsx.c.duration_in_min)))



回答2:


My solution with modified sqlalchemy expression language:

s = select([
        rt_issues.c.id.label('rtissue_id'),
        rt_issues.c.title,
        rt_status.c.duration_in_min,
        rt_status.c.is_last_status,
        rt_status.c.id.label('stage_id'),
        issue_status_logs.c.id.label('issue_log_id'),
        issue_status_logs.c.up_date.label('iss_log_update'),
        (issue_status_logs.c.up_date - func.lag(
                issue_status_logs.c.up_date).over(
                issue_status_logs.c.issue_id)).
        label('mdiff'),
        (func.lead(
                issue_status_logs.c.issue_id).over(
                issue_status_logs.c.issue_id
                )).label('next_id'),
        (func.lead(
                issue_status_logs.c.up_date).over(
                issue_status_logs.c.issue_id,
                issue_status_logs.c.up_date,
                )).label('prev_up_date'),
        issue_status_logs.c.user_id,
        (users.c.first_name + ' ' + users.c.last_name).
        label('updated_by_user'),
        ]).\
    where(and_(*conditions)).\
    select_from(rt_issues.
    outerjoin(issue_status_logs,
              rt_issues.c.id == issue_status_logs.c.issue_id).
    outerjoin(users, issue_status_logs.c.user_id == users.c.id).
    outerjoin(rt_status,
              issue_status_logs.c.to_status == rt_status.c.id)).\
    order_by(issue_status_logs.c.issue_id,
             asc(issue_status_logs.c.up_date)).\
    group_by(
             issue_status_logs.c.issue_id,
             rt_issues.c.id,
             issue_status_logs.c.id,
             rt_status.c.id,
             users.c.id
             )
rs = g.conn.execute(s)
if rs.rowcount > 0:
    for r in rs:
        # IMPT: For issue with no last status
        if not r[rt_status.c.is_last_status]:
            if not r['mdiff'] and (not r['next_id']):
                n = (mto_dt - r['iss_log_update'].replace(tzinfo=None))
            elif ((not r['mdiff']) and
                  (r['next_id'] == r['rtissue_id'])):
                n = (r['prev_up_date'] - r['iss_log_update'])
            else:
                n = (r['mdiff'])
            n =  (n.total_seconds()/60)
            if n > r[rt_status.c.duration_in_min]:
                mx = dict(r)
                q_user_wise_pendency_list.append(mx)

    for t in q_user_wise_pendency_list:
        if not t in temp_list:
            temp_list.append(t)
    q_user_wise_pendency_list = temp_list


来源:https://stackoverflow.com/questions/52330925/how-to-compare-2-successive-row-values-in-a-resultset-object-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!