pandas Iterate through Rows & Column and print it based on some condition

问题

I have an excel file & I Processed that file for Data Analysis and Created a Data Frame(Pandas).
Now I Need to Get the result , I'm trying to get it through iterating over pandas columns and rows using for & if Condition But I'm not getting desired output.
I've Taken hyphen(-) in excel file so that I can apply some conditions.

Excel File Input_File

Required Output
A -> B -> C -> E -> I
F -> G ->L
H -> J -> K 
A1 -> B1
C1 -> A1
Z -> X
Note: Saving Output in Text file in plain. No need of Graph / Visualization

code

df =  pd.read_excel('Test.xlsx')
df.fillna('-')
     
# Below code answer Z -> X
for index, row in df.iterrows():
    if row['Start_Name'] != '-':
        if row['End_Name'] != '-':
            print(row['Start_Name'] +' -> '+ row['End_Name'])

# Below code answer A -> B / F -> G / H -> J / C1 -> A1     
for index, row in df.iterrows():
    if row['Start_Name'] != '-':
        if row['Mid_Name_1'] == '-':
            if row['Mid_Name_2'] != '-':
                print(row['Start_Name'] +' -> '+ row['Mid_Name_2'])

# Below code answer B -> C /  C -> E
for index, row in df.iterrows():
    if row['Mid_Name_1'] != '-':
        if row['Mid_Name_2'] != '-':
            print(row['Mid_Name_1'] +' -> '+ row['Mid_Name_2'])

回答1:

Setup:

Fronts dictionary holds value/position of the sequence that stars with name/key.

Backs dictionary holds value/position of the sequence that ends with name/key.

sequences is a list to hold all combined relations.

position_counter stores position of last made sequence.

from collections import deque
import pandas as pd

data = pd.read_csv("Names_relations.csv")

fronts = dict()
backs = dict()

sequences = []
position_counter = 0

Extract_all. For each row select values that match regex-pattern

selector = data.apply(lambda row: row.str.extractall("([\w\d]+)"), axis=1)

For each relation from selector get extracted elements.

Put them to the queue.

Check if front of new relation can be attached to any previous sequence.

If so:

take position of that sequence.
take sequence itself as llist2
remove last duplicated element from llist2
add the sequences
update sequences with connected llists
update backs with position of the current end of the seuquence
and finally remove exausted ends of the previous sequence from fronts and backs

Analogous to back in fronts.keys():

If no already existing sequence match to new relation:

save that relation
update fronts and backs with position of that relation
update position counter

for relation in selector:
    front, back = relation[0]
    llist = deque((front, back))

    finb =  front in backs.keys()
#     binf = back in fronts.keys()

    if finb:
        position = backs[front]
        llist2 = sequences[position]
        back_llist2 = llist2.pop()
        llist = llist2 + llist
        sequences[position] = llist
        backs[llist[-1]] = position
        if front in fronts.keys():
            del fronts[front]
        if back_llist2 in backs.keys():
            del backs[back_llist2]

#     if binf:
#         position = fronts[back]
#         llist2 = sequences[position]
#         front_llist2 = llist2.popleft()
#         llist = llist + llist2
#         sequences[position] = llist
#         fronts[llist[0]] = position
#         if back in backs.keys():
#             del backs[back]
#         if front_llist2 in fronts.keys():
#             del fronts[front_llist2]

#     if not (finb or binf):
    if not finb: #(equivalent to 'else:')
        sequences.append(llist)
        fronts[front] = position_counter
        backs[back] = position_counter
        position_counter += 1

for s in sequences:
    print(' -> '.join(str(el) for el in s))

Outputs:

A -> B -> C -> E -> I
F -> G -> L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X

#if binf is active:
# A -> B -> C -> E -> I
# F -> G -> L
# H -> J -> K
# C1 -> A1 -> B1
# Z -> X

Name_relations.csv

Start_Name,Mid_Name_1,Mid_Name_2,End_Name
A,-,B,-
-,B,C,-
-,C,E,-
F,-,G,-
H,-,J,-
-,E,-,I
-,J,-,K
-,G,-,L
-,A1,-,B1
C1,-,A1,-
Z,-,-,X

来源：https://stackoverflow.com/questions/64206914/pandas-iterate-through-rows-column-and-print-it-based-on-some-condition

标签

python-3.x

pandas

for-loop