Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have the following data frames.

df_1:

order_id   date
123        2020-01-01
456        NaT
789        2020-10-10
135        2020-05-31
234        NaT
111        NaT

df_2:

order_id   date
123        2020-01-02
456        2021-01-01
789        2020-10-11
135        2020-6-01

The output should capture anytime the date changes to a date in the future from the previous entry and/or when NaT changes to a new date.

new_df should equal:

order_id   date
123        2020-01-02
456        2021-01-01
789        2020-10-11
135        2020-6-01

What I have tried:

df_1['date'] = pd.to_datetime(
    df_1['date'])
df_2['date'] = pd.to_datetime(
    df_2['date'])
s = df_2.set_index('order_id')['date']

mapped = df_1['order_id'].map(s)
mask = mapped > df_1['date']
df_1.loc[mask, 'date'] = mapped

This is only giving changes when the date is changed to a future date but it is not capturing anytime NaT becomes a new date.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
4.5k views
Welcome To Ask or Share your Answers For Others

1 Answer

Use np.where after coercing the dates to datetime.

import numpy as np
df_1['date']=pd.to_datetime(df_1['date'])
df_2['date']=pd.to_datetime(df_2['date'])
df=pd.merge(df_2,df_1, how='left', on='order_id',suffixes=('_left', ''))
df=df.assign(date=np.where(df['date'].isna()|df['date_left'].sub(df['date']).dt.days.gt(0),df['date_left'],df['date'])).drop('date_left',1)



  order_id       date
0       123 2020-01-02
1       456 2021-01-01
2       789 2020-10-11
3       135 2020-06-01

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...