Pandas:连接与修补 concat、combine_first

会有一股神秘感。 提交于 2020-03-17 09:44:00


pandas的pd.concat()函数与np.concatenate()语法类似,当时配置参数更多,功能也更强大:

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
          keys=None, levels=None, names=None, verify_integrity=False,
          copy=True)

1.连接concat

# 连接:concat

s1 = pd.Series([1,2,3])
s2 = pd.Series([2,3,4])
s3 = pd.Series([1,2,3],index = ['a','c','h'])
s4 = pd.Series([2,3,4],index = ['b','e','d'])
print(pd.concat([s1,s2]))
print(pd.concat([s3,s4]).sort_index())
print('-----')
# 默认axis=0,行+行

print(pd.concat([s3,s4], axis=1))
print('-----')
# axis=1,列+列,成为一个Dataframe
0    1
1    2
2    3
0    2
1    3
2    4
dtype: int64
a    1
b    2
c    2
d    4
e    3
h    3
dtype: int64
-----
     0    1
a  1.0  NaN
b  NaN  2.0
c  2.0  NaN
d  NaN  4.0
e  NaN  3.0
h  3.0  NaN
-----

2.连接方式:join,join_axes

# 连接方式:join,join_axes

s5 = pd.Series([1,2,3],index = ['a','b','c'])
s6 = pd.Series([2,3,4],index = ['b','c','d'])
print(pd.concat([s5,s6], axis= 1))
print(pd.concat([s5,s6], axis= 1, join='inner'))
print(pd.concat([s5,s6], axis= 1, join_axes=[['a','b','d']]))
# join:{'inner','outer'},默认为“outer”。如何处理其他轴上的索引。outer为联合,inner为交集。
# join_axes:指定联合的index
     0    1
a  1.0  NaN
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0
   0  1
b  2  2
c  3  3
     0    1
a  1.0  NaN
b  2.0  2.0
d  NaN  4.0

3.覆盖列名(用的较少,做了解)

# 覆盖列名,用的较少,作了解

sre = pd.concat([s5,s6], keys = ['one','two'])
print(sre,type(sre))
print(sre.index)
print('-----')
# keys:序列,默认值无。使用传递的键作为最外层构建层次索引

sre = pd.concat([s5,s6], axis=1, keys = ['one','two'])
print(sre,type(sre))
# axis = 1, 覆盖列名
one  a    1
     b    2
     c    3
two  b    2
     c    3
     d    4
dtype: int64 <class 'pandas.core.series.Series'>
MultiIndex(levels=[['one', 'two'], ['a', 'b', 'c', 'd']],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 1, 2, 3]])
-----
   one  two
a  1.0  NaN
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0 <class 'pandas.core.frame.DataFrame'>

4.修补 pd.combine_first()

# 修补 pd.combine_first()

df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, np.nan, np.nan],[np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],index=[1, 2])
print(df1)
print(df2)
print(df1.combine_first(df2))
print('-----')
# 根据index,df1的空值被df2替代
# 如果df2的index多于df1,则更新到df1上,比如index=['a',1]

df1.update(df2)
print(df1)
# update,直接df2覆盖df1,相同index位置
     0    1    2
0  NaN  3.0  5.0
1 -4.6  NaN  NaN
2  NaN  7.0  NaN
      0    1    2
1 -42.6  NaN -8.2
2  -5.0  1.6  4.0
     0    1    2
0  NaN  3.0  5.0
1 -4.6  NaN -8.2
2 -5.0  7.0  4.0
-----
      0    1    2
0   NaN  3.0  5.0
1 -42.6  NaN -8.2
2  -5.0  1.6  4.0

课后作业

作业一:按要求创建Dataframe df1、df2,并连接成df3
在这里插入图片描述

#作业八
df9 = pd.DataFrame(np.array(np.random.rand(8)).reshape(4,2)
                  ,index = ['a','b','c','d']
                  ,columns = ['values1','values2']
                  )
df10 = pd.DataFrame(np.array(np.random.rand(8)).reshape(4,2)
                  ,index = ['e','f','g','h']
                  ,columns = ['values1','values2']
                  )
print('df9:\n',df9)
print('------------------')
print('df10:\n',df10)
print('------------------')
print(pd.concat([df9,df10],axis=0))
df9:
     values1   values2
a  0.318752  0.892507
b  0.549810  0.079431
c  0.256229  0.940616
d  0.503575  0.000695
------------------
df10:
     values1   values2
e  0.990803  0.063776
f  0.930142  0.131978
g  0.333706  0.502107
h  0.291340  0.189546
------------------
    values1   values2
a  0.318752  0.892507
b  0.549810  0.079431
c  0.256229  0.940616
d  0.503575  0.000695
e  0.990803  0.063776
f  0.930142  0.131978
g  0.333706  0.502107
h  0.291340  0.189546

作业二:按要求创建Dataframe df1、df2,并用df2的值修补df1,生成df3
在这里插入图片描述

#作业九
df11 = pd.DataFrame(np.array(np.random.rand(8)).reshape(4,2)
                  ,index = ['a','b','c','d']
                  ,columns = ['values1','values2']
                  )
df12 = pd.DataFrame(np.stack([np.arange(0,7,2),np.arange(1,8,2)],axis = 1)
                   ,index = ['a','b','c','d']
                   ,columns = ['values1','values2']
                   )
df11['values1'].loc[['b','c']] = np.nan
print('df11:\n',df11)
print('------------------')
print('df12:\n',df12)
print('------------------')
print('修改后:\n',df11.combine_first(df12))
df11:
     values1   values2
a  0.154783  0.190205
b       NaN  0.639978
c       NaN  0.324346
d  0.949075  0.849396
------------------
df12:
    values1  values2
a        0        1
b        2        3
c        4        5
d        6        7
------------------
修改后:
     values1   values2
a  0.154783  0.190205
b  2.000000  0.639978
c  4.000000  0.324346
d  0.949075  0.849396

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!