Python: Divide each row of a DataFrame by another DataFrame vector

前端 未结 5 2080
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-08 20:26

I have a DataFrame (df1) with a dimension 2000 rows x 500 columns (excluding the index) for which I want to divide each row by another DataFrame (df2) with dime

相关标签:
5条回答
  • 2020-12-08 20:43

    In df.divide(df2, axis='index'), you need to provide the axis/row of df2 (ex. df2.iloc[0]).

    import pandas as pd
    
    data1 = {"a":[1.,3.,5.,2.],
             "b":[4.,8.,3.,7.],
             "c":[5.,45.,67.,34]}
    data2 = {"a":[4.],
             "b":[2.],
             "c":[11.]}
    
    df1 = pd.DataFrame(data1)
    df2 = pd.DataFrame(data2) 
    
    df1.div(df2.iloc[0], axis='columns')
    

    or you can use df1/df2.values[0,:]

    0 讨论(0)
  • 2020-12-08 20:51

    Small clarification just in case: the reason why you got NaN everywhere while Andy's first example (df.div(df2)) works for the first line is div tries to match indexes (and columns). In Andy's example, index 0 is found in both dataframes, so the division is made, not index 1 so a line of NaN is added. This behavior should appear even more obvious if you run the following (only the 't' line is divided):

    df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'])
    df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'])
    df_a.div(df_b)
    

    So in your case, the index of the only row of df2 was apparently not present in df1. "Luckily", the column headers are the same in both dataframes, so when you slice the first row, you get a series, the index of which is composed by the column headers of df2. This is what eventually allows the division to take place properly.

    For a case with index and column matching:

    df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'], columns = range(5))
    df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'], columns = [1,2,3,4,5])
    df_a.div(df_b)
    
    0 讨论(0)
  • 2020-12-08 20:53

    If you want to divide each row of a column with a specific value you could try:

    df['column_name'] = df['column_name'].div(10000)
    

    For me, this code divided each row of 'column_name' with 10,000.

    0 讨论(0)
  • 2020-12-08 20:59

    You can divide by the series i.e. the first row of df2:

    In [11]: df = pd.DataFrame([[1., 2.], [3., 4.]], columns=['A', 'B'])
    
    In [12]: df2 = pd.DataFrame([[5., 10.]], columns=['A', 'B'])
    
    In [13]: df.div(df2)
    Out[13]: 
         A    B
    0  0.2  0.2
    1  NaN  NaN
    
    In [14]: df.div(df2.iloc[0])
    Out[14]: 
         A    B
    0  0.2  0.2
    1  0.6  0.4
    
    0 讨论(0)
  • 2020-12-08 21:03

    to divide a row (with single or multiple columns), we need to do the below:

    df.loc['index_value'] = df.loc['index_value'].div(10000)
    
    0 讨论(0)
提交回复
热议问题