Create conditional column for Date Difference based on matching values in two columns

天涯浪子 提交于 2019-12-24 07:26:30

问题


I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data.

        Date    Target1        Close
0   2019-04-17  209.2440    203.130005
1   2019-04-17  212.2155    203.130005
2   2019-04-17  213.6330    203.130005
3   2019-04-17  213.0555    203.130005
4   2019-04-17  212.6250    203.130005
5   2019-04-17  212.9820    203.130005
6   2019-04-17  213.1395    203.130005
7   2019-04-16  209.2860    199.250000
8   2019-04-16  209.9055    199.250000
9   2019-04-16  210.3045    199.250000

I want to create another column (for each observation) (called days_to_hit_target for example) which is the difference of days such that close hits (or comes very close to target of specific day), when it does closes that very closely, then it counts the difference of days and put them in the column days_to_hit_target.


回答1:


This should work:

daysAboveTarget = []
for i in range(len(df.Date)):
    try:
        dayAboveTarget = df.iloc[i:].loc[(df.Close > df.Target1[i])]['Date'].iloc[0]
    except IndexError:
        dayAboveTarget = None
    daysAboveTarget.append(dayAboveTarget)
daysAboveTarget = pd.Series(daysAboveTarget)
df['days_to_hit_target'] = daysAboveTarget - df.Date

I sort of overused iloc and loc here, so let me explain. The variable dayAboveTarget gets the date when the price closes above the target. The first iloc subsets the dataframe to only future dates, the first loc finds the actual results, the second iloc gets only the first result. We need the exception for days where the price never goes above target.




回答2:


NOTE I use python 3.7.1 and pandas 0.23.4. I came up with something very dirty; I am sure there is a neater and more efficient way of doing this.

### Create sample data
date_range = pd.date_range(start="1/1/2018", end="20/1/2018", freq="6H", closed="right")

target1 = np.random.uniform(10, 30, len(date_range))

close = [[i]*4 for i in np.random.uniform(10,30, len(date_range)//4)]
close_flat = np.array([item for sublist in close for item in sublist])

df = pd.DataFrame(np.array([np.array(date_range.date), target1,
    close_flat]).transpose(), columns=["date", "target", "close"])


### Create the column you need
# iterating over the days and finding days when the difference between
# "close" of current day and all "target" is lower than 0.25 OR the "target"
# value is greater than "close" value.
thresh = 0.25
date_diff_arr = np.zeros(len(df))
for i in range(0,len(df),4):
    diff_lt_thresh = df[(abs(df.target-df.close.iloc[i]) < thresh) | (df.target > df.close.iloc[i])]
    # only keep the findings from the next day onwards
    diff_lt_thresh = diff_lt_thresh.loc[i+4:]
    if not diff_lt_thresh.empty:
        # find day difference only if something under thresh is found
        days_diff = (diff_lt_thresh.iloc[0].date - df.iloc[i].date).days
    else:
        # otherwise write it as nan
        days_diff = np.nan
    # fill in the np.array which will be used to write to the df
    date_diff_arr[i:i+4] = days_diff

df["date_diff"] = date_diff_arr

Sample output:

0   2018-01-01    21.64  26.7319        2.0
1   2018-01-01  22.9047  26.7319        2.0
2   2018-01-01  26.0945  26.7319        2.0
3   2018-01-02  10.2155  26.7319        2.0
4   2018-01-02  17.5602  11.0507        1.0
5   2018-01-02  12.0368  11.0507        1.0
6   2018-01-02  19.5923  11.0507        1.0
7   2018-01-03  21.8168  11.0507        1.0
8   2018-01-03  11.5433  16.8862        1.0
9   2018-01-03  27.3739  16.8862        1.0
10  2018-01-03  26.9073  16.8862        1.0
11  2018-01-04  19.6677  16.8862        1.0
12  2018-01-04  25.3599  27.3373        1.0
13  2018-01-04  22.7479  27.3373        1.0
14  2018-01-04  18.7246  27.3373        1.0
15  2018-01-05  25.4122  27.3373        1.0
16  2018-01-05  28.3294  23.8469        1.0



回答3:


maybe a little faster solution:

import pandas as pd

# df is your DataFrame
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values("Date")

def days_to_hit(x, no_hit_default=None):
    return next(
        ((df["Date"].iloc[j+x.name] - x["Date"]).days 
         for j in range(len(df)-x.name) 
         if df["Close"].iloc[j+x.name] >= x["Target1"]), no_hit_default)

df["days_to_hit_target"] = df.apply(days_to_hit, axis=1)


来源:https://stackoverflow.com/questions/55964736/create-conditional-column-for-date-difference-based-on-matching-values-in-two-co

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!