pandas的pd.concat()函数与np.concatenate()语法类似,当时配置参数更多,功能也更强大:
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)
1.连接concat
# 连接:concat
s1 = pd.Series([1,2,3])
s2 = pd.Series([2,3,4])
s3 = pd.Series([1,2,3],index = ['a','c','h'])
s4 = pd.Series([2,3,4],index = ['b','e','d'])
print(pd.concat([s1,s2]))
print(pd.concat([s3,s4]).sort_index())
print('-----')
# 默认axis=0,行+行
print(pd.concat([s3,s4], axis=1))
print('-----')
# axis=1,列+列,成为一个Dataframe
0 1
1 2
2 3
0 2
1 3
2 4
dtype: int64
a 1
b 2
c 2
d 4
e 3
h 3
dtype: int64
-----
0 1
a 1.0 NaN
b NaN 2.0
c 2.0 NaN
d NaN 4.0
e NaN 3.0
h 3.0 NaN
-----
2.连接方式:join,join_axes
# 连接方式:join,join_axes
s5 = pd.Series([1,2,3],index = ['a','b','c'])
s6 = pd.Series([2,3,4],index = ['b','c','d'])
print(pd.concat([s5,s6], axis= 1))
print(pd.concat([s5,s6], axis= 1, join='inner'))
print(pd.concat([s5,s6], axis= 1, join_axes=[['a','b','d']]))
# join:{'inner','outer'},默认为“outer”。如何处理其他轴上的索引。outer为联合,inner为交集。
# join_axes:指定联合的index
0 1
a 1.0 NaN
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
0 1
b 2 2
c 3 3
0 1
a 1.0 NaN
b 2.0 2.0
d NaN 4.0
3.覆盖列名(用的较少,做了解)
# 覆盖列名,用的较少,作了解
sre = pd.concat([s5,s6], keys = ['one','two'])
print(sre,type(sre))
print(sre.index)
print('-----')
# keys:序列,默认值无。使用传递的键作为最外层构建层次索引
sre = pd.concat([s5,s6], axis=1, keys = ['one','two'])
print(sre,type(sre))
# axis = 1, 覆盖列名
one a 1
b 2
c 3
two b 2
c 3
d 4
dtype: int64 <class 'pandas.core.series.Series'>
MultiIndex(levels=[['one', 'two'], ['a', 'b', 'c', 'd']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 1, 2, 3]])
-----
one two
a 1.0 NaN
b 2.0 2.0
c 3.0 3.0
d NaN 4.0 <class 'pandas.core.frame.DataFrame'>
4.修补 pd.combine_first()
# 修补 pd.combine_first()
df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, np.nan, np.nan],[np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],index=[1, 2])
print(df1)
print(df2)
print(df1.combine_first(df2))
print('-----')
# 根据index,df1的空值被df2替代
# 如果df2的index多于df1,则更新到df1上,比如index=['a',1]
df1.update(df2)
print(df1)
# update,直接df2覆盖df1,相同index位置
0 1 2
0 NaN 3.0 5.0
1 -4.6 NaN NaN
2 NaN 7.0 NaN
0 1 2
1 -42.6 NaN -8.2
2 -5.0 1.6 4.0
0 1 2
0 NaN 3.0 5.0
1 -4.6 NaN -8.2
2 -5.0 7.0 4.0
-----
0 1 2
0 NaN 3.0 5.0
1 -42.6 NaN -8.2
2 -5.0 1.6 4.0
课后作业
作业一:按要求创建Dataframe df1、df2,并连接成df3
#作业八
df9 = pd.DataFrame(np.array(np.random.rand(8)).reshape(4,2)
,index = ['a','b','c','d']
,columns = ['values1','values2']
)
df10 = pd.DataFrame(np.array(np.random.rand(8)).reshape(4,2)
,index = ['e','f','g','h']
,columns = ['values1','values2']
)
print('df9:\n',df9)
print('------------------')
print('df10:\n',df10)
print('------------------')
print(pd.concat([df9,df10],axis=0))
df9:
values1 values2
a 0.318752 0.892507
b 0.549810 0.079431
c 0.256229 0.940616
d 0.503575 0.000695
------------------
df10:
values1 values2
e 0.990803 0.063776
f 0.930142 0.131978
g 0.333706 0.502107
h 0.291340 0.189546
------------------
values1 values2
a 0.318752 0.892507
b 0.549810 0.079431
c 0.256229 0.940616
d 0.503575 0.000695
e 0.990803 0.063776
f 0.930142 0.131978
g 0.333706 0.502107
h 0.291340 0.189546
作业二:按要求创建Dataframe df1、df2,并用df2的值修补df1,生成df3
#作业九
df11 = pd.DataFrame(np.array(np.random.rand(8)).reshape(4,2)
,index = ['a','b','c','d']
,columns = ['values1','values2']
)
df12 = pd.DataFrame(np.stack([np.arange(0,7,2),np.arange(1,8,2)],axis = 1)
,index = ['a','b','c','d']
,columns = ['values1','values2']
)
df11['values1'].loc[['b','c']] = np.nan
print('df11:\n',df11)
print('------------------')
print('df12:\n',df12)
print('------------------')
print('修改后:\n',df11.combine_first(df12))
df11:
values1 values2
a 0.154783 0.190205
b NaN 0.639978
c NaN 0.324346
d 0.949075 0.849396
------------------
df12:
values1 values2
a 0 1
b 2 3
c 4 5
d 6 7
------------------
修改后:
values1 values2
a 0.154783 0.190205
b 2.000000 0.639978
c 4.000000 0.324346
d 0.949075 0.849396
来源:CSDN
作者:圻子-
链接:https://blog.csdn.net/weixin_44507435/article/details/104909202