Intersection of two or more DataFrame columns

后端 未结 3 619
盖世英雄少女心
盖世英雄少女心 2020-12-18 12:24

I am trying to find the intersect of three dataframes, however the pd.intersect1d does not like to use three dataframes.

import numpy as np
imp         


        
相关标签:
3条回答
  • 2020-12-18 12:43
    inclusive_list = np.intersect1d(np.intersect1d(df1.columns, df2.columns), df3.columns)
    

    Note that the arguments passed to np.intersect1d (https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.intersect1d.html) are expected to be two arrays (ar1 and ar2).

    Passing 3 arrays means that the assume_unique variable within the function is being set as an array (expected to be a bool).

    You can also use simple native python set methods if you don't want to use numpy

    inclusive_list = set(df1.columns).intersection(set(df2.columns)).intersection(set(df3.columns))
    
    0 讨论(0)
  • 2020-12-18 12:46

    Why your current approach doesn't work:

    intersect1d does not take N arrays, it only compares 2.

    numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)

    You can see from the definition that you are passing the third array as the assume_unique parameter, and since you are treating an array like a single boolean, you receive a ValueError.


    You can extend the functionality of intersect1d to work on N arrays using functools.reduce:

    from functools import reduce
    reduce(np.intersect1d, (df1.columns, df2.columns, df3.columns))
    

    array(['C', 'D'], dtype=object)
    

    A better approach

    However, the easiest approach is to just use intersection on the Index object:

    df1.columns & df2.columns & df3.columns
    

    Index(['C', 'D'], dtype='object')
    
    0 讨论(0)
  • 2020-12-18 13:09

    You can using concat

    pd.concat([df1.head(1),df2.head(1),df3.head(1)],join='inner').columns
    Out[81]: Index(['C', 'D'], dtype='object')
    
    0 讨论(0)
提交回复
热议问题