python - Find value in dataframe closest to a specific time ago -
i have dataframe date-time column , value column, i'd find way create column value @ time closest given interval before date-time.
what i'd have column called "value 2 hours ago", , have value of column correspond "value" column @ time closest 2 hours ago.
for example, if "date-time" column shows "01/01/2014 12:10:00", new column return number in "value" in line "date-time" closest "01/01/2014 10:10:00"
even better if can apply conditions on value based on how far real time interval desired "2 hours" interval. example, "return value closest 2 hours ago, except if it's less 1 hour ago or more 3 hours ago, return nothing"
to illustrate, here sample input dataframe. can value 2 hours ago, , self-merge on 2 date-time columns. challenge have merge on nearest match, rather exact match.
df = pd.dataframe({'date-time' : pd.series(["01/01/2014 04:11:00", "01/01/2014 08:10:00","01/01/2014 09:11:00","01/01/2014 12:10:00"], index=['1', '2','3', '4']),'value' : pd.series([9,12,3,21], index=['1', '2','3','4'])}) df["time"]=pd.to_datetime(df["time"]) df["t_2h_ago"]=df["time"]-pd.to_timedelta('2h') merged=pd.merge(df,df,how='left',left_on='time',right_on='t_2h_ago')
take cartesian product. find difference between timestamps. note assumed each date-time unique in function named nearest_time. group , calculate min of each group. each group, gives closest timestamp in seconds. join back.
from datetime import datetime import time import pandas pd import numpy np df = pd.dataframe({'date-time' : pd.series(["01/01/2014 04:11:00", "01/01/2014 08:10:00","01/01/2014 09:11:00","01/01/2014 12:10:00"], index=['1', '2','3', '4']),'value' : pd.series([9,12,3,21], index=['1', '2','3','4'])}) def nearest_time(x): row_i= datetime.strptime(x['date-time_x'], "%m/%d/%y %h:%m:%s") row_j = datetime.strptime(x['date-time_y'], "%m/%d/%y %h:%m:%s") diff = time.mktime(row_i.timetuple()) - time.mktime(row_j.timetuple()) #seconds ex(2 hrs) if diff == 0: diff = float('inf') return abs(diff) df = df.copy() df['key']=1 df = pd.merge(df,df,on='key') df['diff'] = df.apply(nearest_time,axis=1) df2 = df.copy() df2= df2.groupby(['date-time_x']).agg({'diff': np.min}) df2 = df2[['diff']] df2['date-time_x']=df2.index df3 = pd.merge(df2,df, on=['diff',"date-time_x"]) print df3
Comments
Post a Comment