问题
this is continuation to below post:
How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe?
If a function returns multiple fields from two different arguments, how to use apply() or add them altogether in a new pandas dataframe ?
Sample code:
from pandas import DataFrame
People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
df1 = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
Address_List = [['Jon','Chicago'],['Mark','SFO'],['Maria','Chicago'],['Jill','Chicago'],['Jack','Chicago']]
df2 = DataFrame(Address_List,columns=['First_Name', 'City'])
print (df1, df2)
First_Name Last_Name Age
0 Jon Smith 21
1 Mark Brown 38
2 Maria Lee 42
3 Jill Jones 28
4 Jack Ford 55
First_Name City
0 Jon Chicago
1 Mark SFO
2 Maria Chicago
3 Jill Chicago
4 Jack Chicago
def getTitleBirthYear(df1, df2):
if 'Maria' in df1.First_Name:
title='Ms'
else:
title='Mr'
current_year = int('2020')
birth_year=''
age = df1.Age
birth_year = current_year - age
if 'Chicago' in df2.City:
state='IL'
else:
state='Other'
return title,birth_year,state
#return {'title':title,'birth_year':birth_year, 'state':state}
getTitleBirthYear(df1,df2)
title birth_year state
0 Mr 1999 IL
1 Mr 1982 Other
2 Ms 1978 IL
3 Mr 1992 IL
4 Mr 1965 IL
df = DataFrame.merge(df1,df2,on='First_Name',how='inner')
print(df)
First_Name Last_Name Age City
0 Jon Smith 21 Chicago
1 Mark Brown 38 SFO
2 Maria Lee 42 Chicago
3 Jill Jones 28 Chicago
4 Jack Ford 55 Chicago
df['title', 'birth_year', 'state'] = pd.DataFrame(df.apply(getTitleBirthYear,axis=1).tolist())
However, getting below error: TypeError: ("getTitleBirthYear() missing 1 required positional argument: 'df2'", 'occurred at index 0')
final expected output:
First_Name Last_Name Age City title birth_year state
0 Jon Smith 21 Chicago Mr 1999 IL
1 Mark Brown 38 SFO Mr 1982 Other
2 Maria Lee 42 Chicago Ms 1978 IL
3 Jill Jones 28 Chicago Mr 1992 IL
4 Jack Ford 55 Chicago Mr 1965 IL
回答1:
I think you need numpy.where with Series.rsub for subtract from right side instead your function:
import numpy as np
df = df1.merge(df2,on='First_Name')
df['title'] = np.where(df['First_Name'].eq('Maria'), 'Ms', 'Mr')
df['birth_year'] = df['Age'].rsub(2020)
df['state'] = np.where(df['City'].eq('Chicago'), 'IL', 'Other')
print (df)
First_Name Last_Name Age City title birth_year state
0 Jon Smith 21 Chicago Mr 1999 IL
1 Mark Brown 38 SFO Mr 1982 Other
2 Maria Lee 42 Chicago Ms 1978 IL
3 Jill Jones 28 Chicago Mr 1992 IL
4 Jack Ford 55 Chicago Mr 1965 IL
Your method should be changed with result_type='expand'
in DataFrame.apply, assigned columns to list ['title', 'birth_year', 'state']
(added []
), changed function for check by ==
instead in
.
But solution is slowier/ complicated, so better is use first one.
def getTitleBirthYear(x):
if x.First_Name == 'Maria' :
title='Ms'
else:
title='Mr'
current_year = int('2020')
birth_year=''
age = x.Age
birth_year = current_year - age
if x.City == 'Chicago':
state='IL'
else:
state='Other'
return title,birth_year,state
df = df1.merge(df2,on='First_Name')
df[['title', 'birth_year', 'state']] = df.apply(getTitleBirthYear,
axis=1,
result_type='expand')
print (df)
First_Name Last_Name Age City title birth_year state
0 Jon Smith 21 Chicago Mr 1999 IL
1 Mark Brown 38 SFO Mr 1982 Other
2 Maria Lee 42 Chicago Ms 1978 IL
3 Jill Jones 28 Chicago Mr 1992 IL
4 Jack Ford 55 Chicago Mr 1965 IL
回答2:
You function doesn't need two input args if you can merge the df before hand
df = df1.merge(df2,on='First_Name')
def getTitleBirthYear(x):
if x.First_Name == 'Maria' :
title='Ms'
else:
title='Mr'
current_year = int('2020')
birth_year=''
age = x.Age
birth_year = current_year - age
if x.City == 'Chicago':
state='IL'
else:
state='Other'
return title,birth_year,state
However as stated by @jezrael, this approach is much slower, read more here
df[['title', 'birth_year', 'state']] = df.apply(getTitleBirthYear, axis=1, result_type='expand'
First_Name Last_Name Age City title birth_year state
0 Jon Smith 21 Chicago Mr 1999 IL
1 Mark Brown 38 SFO Mr 1982 Other
2 Maria Lee 42 Chicago Ms 1978 IL
3 Jill Jones 28 Chicago Mr 1992 IL
4 Jack Ford 55 Chicago Mr 1965 IL
来源:https://stackoverflow.com/questions/65122321/how-to-add-new-columns-into-a-new-dataframe-using-output-of-single-function-call