Multiply columns of a dataframe by getting the column names from a list

问题

I have a dataframe in which I have categorical as well as numerical columns.

data = [['A',"India",10,20,30,15,"Cochin"],['B',"India",10,20,30,40,"Chennai"],['C',"India",10,20,30,15,"Chennai"]]
df = pd.DataFrame(data,columns=['Product','Country',"2016 Total","2017 Total","2018 Total","2019 Total","Region"])

Product Country 2016 Total  2017 Total  2018 Total  2019 Total  Region
0   A   India   10           20          30          15         Cochin
1   B   India   10           20          30          40         Chennai
2   C   India   10           20          30          15         Chennai

I know what will be the names of the column of numerical variables(which need to be captured dynamically):

start_year = 2016
current_year = datetime.datetime.now().year
previous_year = current_year - 1 
print(current_year)

year_list = np.arange(start_year, current_year+1, 1)

cols_list = []
for i in year_list:
    if i <= current_year:
        cols = str(i)+" Total"
        cols_list.append(cols)
cols_list

['2016 Total', '2017 Total', '2018 Total', '2019 Total']

I am trying to identify if the values in the columns of cols_list when multiplied is negative or not

How this can be done in pandas? I am not able to figure out how to loop through the cols_list and pull the columns from dataframe and multiply

Expected output:

Product Country 2016 Total  2017 Total  2018 Total  2019 Total  Region  Negative
    0   A   India   10           20          30          15     Cochin No
    1   B   India   10           20          30          40    Chennai No
    2   C   India   10           20          30          15    Chennai No

回答1:

Use numpy.where with condition by DataFrame.prod and Series.lt for <0:

#solution with f-strings for get cols_list by year arange
cols_list = [f'{x} Total' for x in np.arange(start_year, current_year+1)]
print (cols_list)
['2016 Total', '2017 Total', '2018 Total', '2019 Total']

df['Negative'] = np.where(df[cols_list].prod(axis=1).lt(0), 'Yes', 'No')
print (df)
  Product Country  2016 Total  2017 Total  2018 Total  2019 Total   Region  \
0       A   India          10          20          30          15   Cochin   
1       B   India          10          20          30          40  Chennai   
2       C   India          10          20          30          15  Chennai   

  Negative  
0       No  
1       No  
2       No

回答2:

You can use df.filter() to filter columns having Total(similar result to your cols_list) and then use df.prod() over axis=1 , then s.map():

df['Negative']=df.filter(like='Total').prod(axis=1).lt(0).map({True:'Yes',False:'No'})
print(df)

  Product Country  2016 Total  2017 Total  2018 Total  2019 Total   Region  \
0       A   India          10          20          30          15   Cochin   
1       B   India          10          20          30          40  Chennai   
2       C   India          10          20          30          15  Chennai   

  Negative  
0       No  
1       No  
2       No

回答3:

Try this:

df['Negative'] = df[cols_list].T.product().apply(lambda x: x < 0)

The df[cols_list].T there transposes the columns into rows. This way we can take the product for the rows (which pandas lets us do with a single function call).

Step-by-step:

>>> t = df[cols_list].T
>>> t
       0   1   2
2016  10  10  10
2017  20  20  20
2018  30  30  30

>>> p = t.product()
>>> p
0    6000
1    6000
2    6000
dtype: int64

>>> neg = p.apply(lambda x: x < 0)
>>> neg
0    False
1    False
2    False
dtype: bool

来源：https://stackoverflow.com/questions/55042354/multiply-columns-of-a-dataframe-by-getting-the-column-names-from-a-list

标签

python-3.x

pandas

dataframe