问题
I have a table issue_logs:
id | issue_id | from_status | to_status | up_date | remarks
----+----------+-------------+-----------+----------------------------------+-----------
29 | 20 | 10 | 11 | 2018-09-14 11:43:13.907052+05:30 | UPDATED
28 | 20 | 9 | 10 | 2018-09-14 11:42:59.612728+05:30 | UPDATED
27 | 20 | | 9 | 2018-09-11 17:45:35.13891+05:30 | NEW issue
26 | 19 | 9 | 11 | 2018-09-06 16:37:05.935588+05:30 | UPDATED
25 | 19 | | 9 | 2018-09-06 16:27:40.543001+05:30 | NEW issue
24 | 18 | 11 | 10 | 2018-09-05 17:13:37.568762+05:30 | UPDATED
and rt_status:
id | description | duration_in_min
----+------------------+-----------------
1 | new | 1
2 | working | 1
3 | approval pending | 1
4 | resolved | 1
5 | initial check | 1
6 | parts purchase | 1
7 | shipment | 1
8 | close | 1
9 | initial check | 1
10 | parts purchase | 1
11 | shipment | 1
12 | close | 1
For a date range from_datetime = '2018-09-06T16:34' to to_datetime = '2018-09-14T12:27' I want to select all the issues that have exceeded the duration_of_time set for each status value defined in the rt_status table. I should get from issue logs the records with ids 29, 27, and 26. The records with ids 29, and 26 should consider the time elapsed between their last up_date and to_datetime.
I would like to use the func.lag and over to do it, but I'm unable to get the correct records. I am using Postgresql 9.6, and Python 2.7. How exactly can I get the func.lag or func.lead to work using SQLAlchemy Core only?
What I tried:
s = select([
rt_issues.c.id.label('rtissue_id'),
rt_issues,
rt_status.c.duration_in_min,
rt_status.c.id.label('stage_id'),
issue_status_logs.c.id.label('issue_log_id'),
issue_status_logs.c.up_date.label('iss_log_update'),
(issue_status_logs.c.up_date - func.lag(
issue_status_logs.c.up_date).over(
issue_status_logs.c.issue_id
)).label('mdiff'),
]).\
where(and_(*conditions)).\
select_from(rt_issues.
outerjoin(issue_status_logs,
rt_issues.c.id == issue_status_logs.c.issue_id).
outerjoin(rt_status,
issue_status_logs.c.to_status == rt_status.c.id)).\
order_by(asc(issue_status_logs.c.up_date),
issue_status_logs.c.issue_id).\
group_by(
issue_status_logs.c.issue_id,
rt_issues.c.id,
issue_status_logs.c.id
)
rs = g.conn.execute(s)
mcnt = rs.rowcount
print mcnt, 'rowcont'
if rs.rowcount > 0:
for r in rs:
print dict(r)
This yields results that include wrong records, i.e. issue log with id 28. Can anyone help with rectifying the error?
回答1:
Though you yourself managed to solve your question, here's one take on it that does not use window functions, namely lag() or lead(). In order to compare differences between the up_date timestamps of consecutive issue logs you could self left join. In SQL the query could look like
select ilx.id
from issue_logs ilx
join rt_status rsx on rsx.id = ilx.to_status
left join issue_logs ily on ily.from_status = ilx.to_status
and ily.issue_id = ilx.issue_id
where ilx.up_date >= '2018-09-06T16:34'
and ilx.up_date <= ( coalesce(ily.up_date, '2018-09-14T12:27') -
interval '1 minute' * rsx.duration_in_min );
and the same in SQLAlchemy SQL Expression Language:
from_datetime = '2018-09-06T16:34'
to_datetime = '2018-09-14T12:27'
ilx = issue_status_logs.alias()
ily = issue_status_logs.alias()
rsx = rt_status
query = select([ilx.c.id]).\
select_from(
ilx.
join(rsx, rsx.c.id == ilx.c.to_status).
outerjoin(ily, and_(ily.c.from_status == ilx.c.to_status,
ily.c.issue_id == ilx.c.issue_id))).\
where(and_(ilx.c.up_date >= from_datetime,
ilx.c.up_date <= (func.coalesce(ily.c.up_date, to_datetime) -
cast('1 minute', Interval) *
rsx.c.duration_in_min)))
回答2:
My solution with modified sqlalchemy expression language:
s = select([
rt_issues.c.id.label('rtissue_id'),
rt_issues.c.title,
rt_status.c.duration_in_min,
rt_status.c.is_last_status,
rt_status.c.id.label('stage_id'),
issue_status_logs.c.id.label('issue_log_id'),
issue_status_logs.c.up_date.label('iss_log_update'),
(issue_status_logs.c.up_date - func.lag(
issue_status_logs.c.up_date).over(
issue_status_logs.c.issue_id)).
label('mdiff'),
(func.lead(
issue_status_logs.c.issue_id).over(
issue_status_logs.c.issue_id
)).label('next_id'),
(func.lead(
issue_status_logs.c.up_date).over(
issue_status_logs.c.issue_id,
issue_status_logs.c.up_date,
)).label('prev_up_date'),
issue_status_logs.c.user_id,
(users.c.first_name + ' ' + users.c.last_name).
label('updated_by_user'),
]).\
where(and_(*conditions)).\
select_from(rt_issues.
outerjoin(issue_status_logs,
rt_issues.c.id == issue_status_logs.c.issue_id).
outerjoin(users, issue_status_logs.c.user_id == users.c.id).
outerjoin(rt_status,
issue_status_logs.c.to_status == rt_status.c.id)).\
order_by(issue_status_logs.c.issue_id,
asc(issue_status_logs.c.up_date)).\
group_by(
issue_status_logs.c.issue_id,
rt_issues.c.id,
issue_status_logs.c.id,
rt_status.c.id,
users.c.id
)
rs = g.conn.execute(s)
if rs.rowcount > 0:
for r in rs:
# IMPT: For issue with no last status
if not r[rt_status.c.is_last_status]:
if not r['mdiff'] and (not r['next_id']):
n = (mto_dt - r['iss_log_update'].replace(tzinfo=None))
elif ((not r['mdiff']) and
(r['next_id'] == r['rtissue_id'])):
n = (r['prev_up_date'] - r['iss_log_update'])
else:
n = (r['mdiff'])
n = (n.total_seconds()/60)
if n > r[rt_status.c.duration_in_min]:
mx = dict(r)
q_user_wise_pendency_list.append(mx)
for t in q_user_wise_pendency_list:
if not t in temp_list:
temp_list.append(t)
q_user_wise_pendency_list = temp_list
来源:https://stackoverflow.com/questions/52330925/how-to-compare-2-successive-row-values-in-a-resultset-object-using-python