问题
I want to merge two data frames on Date Time column dtype.date-time columns contain both similar and different values. But I am unable to merge them such that all unique date-time rows are finally there..with NA in uncommon columns. I am getting NAs in date_time column for 2nd data frame. tried both in R and python
python code:
df=pd.merge(df_met, df_so2, how='left', on='Date_Time')
In R..data_type is date-time using as.POSIXct
df_2<-join(so2, met_km, type="inner")
df3 <- merge(so2, met_km, all = TRUE)
df_4 <- merge(so2, met_km, by.x = "Date_Time", by.y = "Date_Time")
df_so2:
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1 2 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2 2 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3 2 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2
4 2 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2
5 2 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2
df_met:
X Date_Time air_temp_set_1 dew_point_temperature_set_1
1 2015-01-01 1:00 35.6 35.6
2 2015-01-01 2:00 35.6 35.6
3 2015-01-01 3:00 35.6 35.6
4 2015-01-01 4:00 33.8 33.8
5 2015-01-01 5:00 33.2 33.2
6 2015-01-01 6:00 33.8 33.8
7 2015-01-01 7:00 33.8 33.8
Expected Output:
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1 1.0 2 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2 2.0 2 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3 NaN NaN 2015-01-01 1:00 NaN NaN NaN
4 NaN NaN 2015-01-01 2:00 NaN NaN NaN
回答1:
merge on outer should get them all:
- pandas.DataFrame.merge
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.- based upon your comment, you want all the dates, not just those shown in
Expected Output - add the
parameter,sort=Trueif you want them sorted bydate
df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')
X_x POC Datum Date_Time Date_GMT Sample.Measurement MDL X_y air_temp_set_1 dew_point_temperature_set_1
1.0 2.0 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2 3 35.6 35.6
2.0 2.0 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2 4 33.8 33.8
3.0 2.0 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2 5 33.2 33.2
4.0 2.0 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2 6 33.8 33.8
5.0 2.0 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2 7 33.8 33.8
NaN NaN NaN 2015-01-01 1:00 NaN NaN NaN 1 35.6 35.6
NaN NaN NaN 2015-01-01 2:00 NaN NaN NaN 2 35.6 35.6
without columns from df_met:
df_exp.drop(columns=['X_y', 'air_temp_set_1', 'dew_point_temperature_set_1'], inplace=True)
df_exp.rename(columns={'X_x': 'X'}, inplace=True)
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1.0 2.0 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2.0 2.0 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3.0 2.0 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2
4.0 2.0 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2
5.0 2.0 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2
NaN NaN NaN 2015-01-01 1:00 NaN NaN NaN
NaN NaN NaN 2015-01-01 2:00 NaN NaN NaN
回答2:
merge(df_so2, df_met, by = "Date_Time", all = T)
Date_Time X.x POC Datum Date_GMT Sample.Measurement MDL X.y air_temp_set_1 dew_point_temperature_set_1
1 2015-01-01 1:00 NA NA <NA> <NA> NA NA 1 35.6 35.6
2 2015-01-01 2:00 NA NA <NA> <NA> NA NA 2 35.6 35.6
3 2015-01-01 3:00 1 2 WGS84 01/01/2015 09:00 2.3 0.2 3 35.6 35.6
4 2015-01-01 4:00 2 2 WGS84 01/01/2015 10:00 2.5 0.2 4 33.8 33.8
5 2015-01-01 5:00 3 2 WGS84 01/01/2015 11:00 2.1 0.2 5 33.2 33.2
6 2015-01-01 6:00 4 2 WGS84 01/01/2015 12:00 2.3 0.2 6 33.8 33.8
7 2015-01-01 7:00 5 2 WGS84 01/01/2015 13:00 1.1 0.2 7 33.8 33.8
回答3:
- Whoever is reading this, don't downvote. I'm working with the OP to resolve his error, then we'll delete this answer.
df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')
I got:
POC Datum Date_Time Date_GMT Sample.Measurement MDL air_temp_set_1 dew_point_temperature_set_1 relative_humidity_set_1 wind_speed_set_1 cloud_layer_1_code_set_1 wind_direction_set_1 pressure_set_1d weather_cond_code_set_1 visibility_set_1 wind_cardinal_direction_set_1d weather_condition_set_1d
2 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2 35.6 35.6 100.0 0.0 14.0 0.0 29.943333 9.0 0.25 N Fog
1 WGS84 2015-01-01 3:00 01/01/2015 09:00 0.6 2.0 35.6 35.6 100.0 0.0 14.0 0.0 29.943333 9.0 0.25 N Fog
1 WGS84 2015-01-01 3:00 01/01/2015 12:00 7.4 0.2 35.6 35.6 100.0 0.0 14.0 0.0 29.943333 9.0 0.25 N Fog
1 WGS84 2015-01-01 3:00 01/01/2015 10:00 1.0 0.2 35.6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Notes:
- Check
df_met.info()anddf_so2.info()and verifyDate_Timeisnon-null datetime64[ns]- If not, try the following:
df_so2.Date_Time = pd.to_datetime(df_so2.Date_Time)df_met.Date_Time = pd.to_datetime(df_met.Date_Time)
来源:https://stackoverflow.com/questions/57932570/merging-data-on-date-time-column-posixct-format