Sliding windows - measuring length of observations on each looped window

倖福魔咒の 提交于 2020-08-10 19:32:05

问题


Let's analyse this sample code where zip() is used to create different windows from a dataset and return them in loop.

months = [Jan, Feb, Mar, Apr, May]

for x, y in zip(months, months[1:]):
    print(x, y)

# Output of each window will be:
Jan Feb 
Feb Mar
Mar Apr
Apr May

Let's suppose that now I want to calculate the respective length percentage between the months used in each window.

Example in steps:

  1. When returning the first window (Jan Feb), I want to calculate the % length of Jan over the full window (which equals to Jan + Feb) and return it a new variable
  2. When returning the second window (Feb Mar), I want to calculate the % length of Feb over the full window (which equals to Feb + Mar) and return it a new variable
  3. Continuing this process until last window

Any suggestions on how I might implement this idea in the for loop are welcome!

Thank you!

EDIT

months = [Jan, Feb, Mar, Apr, May]

for x, y in zip(months, months[2:]):
    print(x, y)

# Output of each window will be:
Jan Feb March
Feb Mar Apr
Mar Apr May

The goal is to calculate the length of two months on each window over the full window length:

  • 1st window: Jan + Feb / Jan + Feb + March
  • 2nd window: Feb + Mar / Feb + Mar + Apr
  • continuing to last window

We can now calculate one month over the size of each window (with start.month). However, how do we adapt this to include more than one month?

Also, instead of using days_in_month, would there be a way to use the length of the datapoints (rows) in each month?

By using length of datapoints (rows) I mean that each month has many datapoints in 'time' format (e.g., 60 mins format). This would imply that 1 day in a month would have 24 different datapoints (rows). Example:

                         column
rows             
01-Jan-2010 T00:00        value
01-Jan-2010 T01:00        value
01-Jan-2010 T02:00        value
...                       ...
01-Jan-2010 T24:00        value
02-Jan-2010 T00:00        value
...                       ...

Thank you!


回答1:


Here is one way. (In my case, months is a period_range object.)

import pandas as pd
months = pd.period_range(start='2020-01', periods=5, freq='M')

Now, iterate over range. Each iteration is a two-month window.

# print header labels
print('{:10s} {:10s} {:>10s} {:>10s} {:>10s} {:>10s} '.format(
    'start', 'end', 'month', 'front (d)', 'total (d)', 'frac'))

for start, end in zip(months, months[1:]):
    front_month = start.month

    # number of days in first month (e.g., Jan)
    front_month_days = start.days_in_month

    # number of days in current sliding window (e.g., Jan + Feb)
    days_in_curr_window = (end.end_time - start.start_time).days

    frac = front_month_days / days_in_curr_window

    print('{:10s} {:10s} {:10d} {:10d} {:10d} {:10.3f}'.format(
        str(start), str(end), front_month,
        front_month_days, days_in_curr_window, frac))


start      end             month  front (d)  total (d)       frac 
2020-01    2020-02             1         31         60      0.517
2020-02    2020-03             2         29         60      0.483
2020-03    2020-04             3         31         61      0.508
2020-04    2020-05             4         30         61      0.492


来源:https://stackoverflow.com/questions/63230518/sliding-windows-measuring-length-of-observations-on-each-looped-window

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!