Subtracting values based on a relationship table

六月ゝ 毕业季﹏ 提交于 2020-06-01 07:43:25

问题


I want to develop some code that will calculate the value of the target location (down gradient) by using a relationship table of targets and sources. The general formula is (value = down gradient - up gradient) or, given my relationship table, (value = target - all contributing source locations).

Operationally, what I want to do is similar to one of my other posts, only this time I want to use subtraction.

So, let's start with:

import pandas as pd
import networkx as nx
import numpy as np

df = pd.DataFrame({
"Site 1": np.random.rand(10),
"Site 2": np.random.rand(10),
"Site 3": np.random.rand(10),
"Site 4": np.random.rand(10),
"Site 5": np.random.rand(10),
"Site 6": np.random.rand(10)})

and the relationship table:

df_order = {'source': ["Site 1","Site 2", "Site 3", "Site 4", "Site 5", "Site 6"],
        'target': ["Site 3","Site 3","Site 4","Site 4", "Site 6","None"]
        }
dfo = pd.DataFrame(df_order, columns = ['source', 'target'])

Visually, this looks like:

With a sample calculation, I could manually perform the operation on "Site 3" via:

df_sum = df.loc[:,'Site 1':'Site 2'].sum(axis = 1)
df_3_sub = df.loc[:, 'Site 3'].subtract(df_sum)
print(df_3_sub)

In the example I linked, I ended up with a nice solution (thanks to the respondent!) where I used:

import networkx as nx
G = nx.from_pandas_edgelist(df_order.dropna(), 
                            source='source', target='target', 
                            create_using=nx.DiGraph)
nx.draw(G, with_labels=True)

    def all_preds(G, target):
        preds=[target]
        for p in list(G.predecessors(target)):
            preds += all_preds(G, p)
        return preds

pd.concat([
    df[all_preds(G, target)].sum(1).rename(target)
    for target in df_order['source'].unique()
    ], axis=1)

Now, I want to be able to essentially call .diff(1) instead of sum(1). Is there a relatively simple way to accomplish this?

Additionally, the most up-gradient points (or starting points) will not have any values contributing to them and they do not need to be carried over to the new DataFrame. Also, the subtraction operation will always pull from the original DataFrame to subtract the values and put the newly subtracted value into a new DataFrame. Just to be clear, I am not slotting the newly subtracted value back into the original Dataframe in place of the original "Site 3" values, for example.

EDIT:

It is not super pretty or efficient, but I think I figured out how to go about this with a for loop:

result= pd.DataFrame()

for site in df2.columns:
    upgradient = df2[all_preds(G, site)].drop(site,axis=1).sum(axis=1)
    downgradient = df2[site]
    calc = downgradient.subtract(upgradient) 
    result.append(calc, ignore_index=True)

I think I just need help with the last part of the for loop so that the result is a cohesive DataFrame and the column names match the name in df2[site] at each step in the for loop. I welcome any thoughts, comments or modifications to my code!


回答1:


Well, I think I found one way to accomplish what I wanted to. I am sure there is a more efficient way, but this seems to work for me at the moment. I am still open to suggestions if there is a more elegant/efficient solution out there.

import pandas as pd
import networkx as nx
import numpy as np  


df2 = pd.DataFrame({
    "Site 1": np.random.rand(10),
    "Site 2": np.random.rand(10),
    "Site 3": np.random.rand(10),
    "Site 4": np.random.rand(10),
    "Site 5": np.random.rand(10),
    "Site 6": np.random.rand(10)})

print(df2)
df_order2 = {'source': ["Site 1","Site 2", "Site 3", "Site 4", "Site 5", "Site 6"],
        'target': ["Site 3","Site 3","Site 5","Site 5", "Site 6","None"]
        }

dfo2 = pd.DataFrame(df_order, columns = ['source', 'target'])
(print(dfo2))

def all_preds(G, target):
    preds = [target]
    for p in list(G.predecessors(target)):
        preds += all_preds(G, p)
    return preds

result = []

for site in df2.columns:
    upgradient = df2[all_preds(G, site)].drop(site,axis=1).sum(axis=1)
    downgradient = df2[site]
    result.append(downgradient.subtract(upgradient))

rfinal = pd.concat(result, axis=1)  
rfinal.columns = df2.columns.values


来源:https://stackoverflow.com/questions/61807858/subtracting-values-based-on-a-relationship-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!