What is the meaning of “axis” attribute in a Pandas DataFrame?

前端 未结 5 1968
遇见更好的自我
遇见更好的自我 2021-02-04 02:28

Taking the following example:

>>> df1 = pd.DataFrame({\"x\":[1, 2, 3, 4, 5], 
                        \"y\":[3, 4, 5, 6, 7]}, 
                      ind         


        
5条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-04 03:08

    First, OP misunderstood the rows and columns in his/her dataframe.

    But the acutal output considers rows that are found in both dataframes.(the only common row element 'y')

    OP thought the label y is for row. However, y is a column name.

    df1 = pd.DataFrame(
             {"x":[1, 2, 3, 4, 5],  # <-- looks like row x but actually col x
              "y":[3, 4, 5, 6, 7]}, # <-- looks like row y but actually col y
              index=['a', 'b', 'c', 'd', 'e'])
    print(df1)
    
                \col   x    y
     index or row\
              a       1     3   |   a
              b       2     4   v   x
              c       3     5   r   i
              d       4     6   o   s
              e       5     7   w   0
    
                   -> column
                     a x i s 1
    

    It is very easy to be misled since in the dictionary, it looks like y and x are two rows.

    If you generate df1 from a list of list, it should be more intuitive:

    df1 = pd.DataFrame([[1,3], 
                        [2,4],
                        [3,5],
                        [4,6],
                        [5,7]],
                        index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])
    

    So back to the problem, concat is a shorthand for concatenate (means to link together in a series or chain on this way [source]) Performing concat along axis 0 means to linking two objects along axis 0.

       1
       1   <-- series 1
       1
    ^  ^  ^
    |  |  |               1
    c  a  a               1
    o  l  x               1
    n  o  i   gives you   2
    c  n  s               2
    a  g  0               2
    t  |  |
    |  V  V
    v 
       2
       2   <--- series 2
       2
    

    So... think you have the feeling now. What about sum function in pandas? What does sum(axis=0) means?

    Suppose data looks like

       1 2
       1 2
       1 2
    

    Maybe...summing along axis 0, you may guess. Yes!!

    ^  ^  ^
    |  |  |               
    s  a  a               
    u  l  x                
    m  o  i   gives you two values 3 6 !
    |  n  s               
    v  g  0               
       |  |
       V  V
    

    What about dropna? Suppose you have data

       1  2  NaN
      NaN 3   5
       2  4   6
    

    and you only want to keep

    2
    3
    4
    

    On the documentation, it says Return object with labels on given axis omitted where alternately any or all of the data are missing

    Should you put dropna(axis=0) or dropna(axis=1)? Think about it and try it out with

    df = pd.DataFrame([[1, 2, np.nan],
                       [np.nan, 3, 5],
                       [2, 4, 6]])
    
    # df.dropna(axis=0) or df.dropna(axis=1) ?
    

    Hint: think about the word along.

提交回复
热议问题