python pandas-possible to compare 3 dfs of same shape using where(max())? is this a masking issue?

拟墨画扇 提交于 2020-01-24 13:23:29

问题


I have a dict containing 3 dataframes of identical shape. I would like to create:

  1. a 4th dataframe which identifies the largest value from the original 3 at each coordinate - so dic['four'].ix[0,'A'] = MAX( dic['one'].ix[0,'A'], dic['two'].ix[0,'A'], dic['three'].ix[0,'A'] )
  2. a 5th with the second largest value

    dic = {}
    for i in ['one','two','three']:
        dic[i] = pd.DataFrame(np.random.randint(0,100,size=(10,3)), columns=list('ABC'))
    

I cannot figure out how to use .where() to compare the original 3 dfs. Looping through would be inefficient for ultimate data set.


回答1:


consider the dict dfs which is a dictionary of pd.DataFrames

import pandas as pd
import numpy as np

np.random.seed([3,1415])
dfs = dict(
    one=pd.DataFrame(np.random.randint(1, 10, (5, 5))),
    two=pd.DataFrame(np.random.randint(1, 10, (5, 5))),
    three=pd.DataFrame(np.random.randint(1, 10, (5, 5))),
)

the best way to handle this is with a pd.Panel object, which is the higher dimensional object analogous to pd.DataFrame.

p = pd.Panel(dfs)

then the answers you need are very straighforward

max
p.max(axis='items') or p.max(0)

penultimate
p.apply(lambda x: np.sort(x)[-2], axis=0)




回答2:


The 1st question is easy to answer, you could use the numpy.maximum() function to find the element wise maximum value in each cell, across multiple dataframes

dic ['four'] = pd.DataFrame(np.maximum(dic['one'].values,dic['two'].values,dic['three'].values),columns = list('ABC'))


来源:https://stackoverflow.com/questions/40000718/python-pandas-possible-to-compare-3-dfs-of-same-shape-using-wheremax-is-thi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!