问题
I have a dataframe in which I have categorical as well as numerical columns.
data = [['A',"India",10,20,30,15,"Cochin"],['B',"India",10,20,30,40,"Chennai"],['C',"India",10,20,30,15,"Chennai"]]
df = pd.DataFrame(data,columns=['Product','Country',"2016 Total","2017 Total","2018 Total","2019 Total","Region"])
Product Country 2016 Total 2017 Total 2018 Total 2019 Total Region
0 A India 10 20 30 15 Cochin
1 B India 10 20 30 40 Chennai
2 C India 10 20 30 15 Chennai
I know what will be the names of the column of numerical variables(which need to be captured dynamically):
start_year = 2016
current_year = datetime.datetime.now().year
previous_year = current_year - 1
print(current_year)
year_list = np.arange(start_year, current_year+1, 1)
cols_list = []
for i in year_list:
if i <= current_year:
cols = str(i)+" Total"
cols_list.append(cols)
cols_list
['2016 Total', '2017 Total', '2018 Total', '2019 Total']
I am trying to identify if the values in the columns of cols_list when multiplied is negative or not
How this can be done in pandas? I am not able to figure out how to loop through the cols_list and pull the columns from dataframe and multiply
Expected output:
Product Country 2016 Total 2017 Total 2018 Total 2019 Total Region Negative
0 A India 10 20 30 15 Cochin No
1 B India 10 20 30 40 Chennai No
2 C India 10 20 30 15 Chennai No
回答1:
Use numpy.where with condition by DataFrame.prod and Series.lt for <0
:
#solution with f-strings for get cols_list by year arange
cols_list = [f'{x} Total' for x in np.arange(start_year, current_year+1)]
print (cols_list)
['2016 Total', '2017 Total', '2018 Total', '2019 Total']
df['Negative'] = np.where(df[cols_list].prod(axis=1).lt(0), 'Yes', 'No')
print (df)
Product Country 2016 Total 2017 Total 2018 Total 2019 Total Region \
0 A India 10 20 30 15 Cochin
1 B India 10 20 30 40 Chennai
2 C India 10 20 30 15 Chennai
Negative
0 No
1 No
2 No
回答2:
You can use df.filter() to filter columns having Total
(similar result to your cols_list
) and then use df.prod() over axis=1
, then s.map()
:
df['Negative']=df.filter(like='Total').prod(axis=1).lt(0).map({True:'Yes',False:'No'})
print(df)
Product Country 2016 Total 2017 Total 2018 Total 2019 Total Region \
0 A India 10 20 30 15 Cochin
1 B India 10 20 30 40 Chennai
2 C India 10 20 30 15 Chennai
Negative
0 No
1 No
2 No
回答3:
Try this:
df['Negative'] = df[cols_list].T.product().apply(lambda x: x < 0)
The df[cols_list].T
there transposes the columns into rows. This way we can take the product
for the rows (which pandas lets us do with a single function call).
Step-by-step:
>>> t = df[cols_list].T
>>> t
0 1 2
2016 10 10 10
2017 20 20 20
2018 30 30 30
>>> p = t.product()
>>> p
0 6000
1 6000
2 6000
dtype: int64
>>> neg = p.apply(lambda x: x < 0)
>>> neg
0 False
1 False
2 False
dtype: bool
来源:https://stackoverflow.com/questions/55042354/multiply-columns-of-a-dataframe-by-getting-the-column-names-from-a-list