pandas three-way joining multiple dataframes on columns

前端 未结 11 1943
醉梦人生
醉梦人生 2020-11-22 08:35

I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.

How can

11条回答
  •  青春惊慌失措
    2020-11-22 08:53

    Here is a method to merge a dictionary of data frames while keeping the column names in sync with the dictionary. Also it fills in missing values if needed:

    This is the function to merge a dict of data frames

    def MergeDfDict(dfDict, onCols, how='outer', naFill=None):
      keys = dfDict.keys()
      for i in range(len(keys)):
        key = keys[i]
        df0 = dfDict[key]
        cols = list(df0.columns)
        valueCols = list(filter(lambda x: x not in (onCols), cols))
        df0 = df0[onCols + valueCols]
        df0.columns = onCols + [(s + '_' + key) for s in valueCols] 
    
        if (i == 0):
          outDf = df0
        else:
          outDf = pd.merge(outDf, df0, how=how, on=onCols)   
    
      if (naFill != None):
        outDf = outDf.fillna(naFill)
    
      return(outDf)
    

    OK, lets generates data and test this:

    def GenDf(size):
      df = pd.DataFrame({'categ1':np.random.choice(a=['a', 'b', 'c', 'd', 'e'], size=size, replace=True),
                          'categ2':np.random.choice(a=['A', 'B'], size=size, replace=True), 
                          'col1':np.random.uniform(low=0.0, high=100.0, size=size), 
                          'col2':np.random.uniform(low=0.0, high=100.0, size=size)
                          })
      df = df.sort_values(['categ2', 'categ1', 'col1', 'col2'])
      return(df)
    
    
    size = 5
    dfDict = {'US':GenDf(size), 'IN':GenDf(size), 'GER':GenDf(size)}   
    MergeDfDict(dfDict=dfDict, onCols=['categ1', 'categ2'], how='outer', naFill=0)
    

提交回复
热议问题