问题
I have an excel file & I Processed that file for Data Analysis and Created a Data Frame(Pandas).
Now I Need to Get the result ,
I'm trying to get it through iterating over pandas columns and rows using for & if Condition But I'm not getting desired output.
I've Taken hyphen(-) in excel file so that I can apply some conditions.
Excel File Input_File
Required Output
A -> B -> C -> E -> I
F -> G ->L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
Note: Saving Output in Text file in plain. No need of Graph / Visualization
code
df = pd.read_excel('Test.xlsx')
df.fillna('-')
# Below code answer Z -> X
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['End_Name'] != '-':
print(row['Start_Name'] +' -> '+ row['End_Name'])
# Below code answer A -> B / F -> G / H -> J / C1 -> A1
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['Mid_Name_1'] == '-':
if row['Mid_Name_2'] != '-':
print(row['Start_Name'] +' -> '+ row['Mid_Name_2'])
# Below code answer B -> C / C -> E
for index, row in df.iterrows():
if row['Mid_Name_1'] != '-':
if row['Mid_Name_2'] != '-':
print(row['Mid_Name_1'] +' -> '+ row['Mid_Name_2'])
回答1:
Setup:
Fronts dictionary holds value/position of the sequence that stars with name/key.
Backs dictionary holds value/position of the sequence that ends with name/key.
sequences is a list to hold all combined relations.
position_counter stores position of last made sequence.
from collections import deque
import pandas as pd
data = pd.read_csv("Names_relations.csv")
fronts = dict()
backs = dict()
sequences = []
position_counter = 0
Extract_all. For each row select values that match regex-pattern
selector = data.apply(lambda row: row.str.extractall("([\w\d]+)"), axis=1)
For each relation from selector get extracted elements.
Put them to the queue.
Check if front of new relation can be attached to any previous sequence.
If so:
- take
positionof that sequence. - take sequence itself as
llist2 - remove last duplicated element from
llist2 - add the sequences
- update
sequenceswith connected llists - update
backswith position of the current end of the seuquence - and finally remove exausted ends of the previous sequence from
frontsandbacks
Analogous to back in fronts.keys():
If no already existing sequence match to new relation:
- save that relation
- update
frontsandbackswith position of that relation - update position counter
for relation in selector:
front, back = relation[0]
llist = deque((front, back))
finb = front in backs.keys()
# binf = back in fronts.keys()
if finb:
position = backs[front]
llist2 = sequences[position]
back_llist2 = llist2.pop()
llist = llist2 + llist
sequences[position] = llist
backs[llist[-1]] = position
if front in fronts.keys():
del fronts[front]
if back_llist2 in backs.keys():
del backs[back_llist2]
# if binf:
# position = fronts[back]
# llist2 = sequences[position]
# front_llist2 = llist2.popleft()
# llist = llist + llist2
# sequences[position] = llist
# fronts[llist[0]] = position
# if back in backs.keys():
# del backs[back]
# if front_llist2 in fronts.keys():
# del fronts[front_llist2]
# if not (finb or binf):
if not finb: #(equivalent to 'else:')
sequences.append(llist)
fronts[front] = position_counter
backs[back] = position_counter
position_counter += 1
for s in sequences:
print(' -> '.join(str(el) for el in s))
Outputs:
A -> B -> C -> E -> I
F -> G -> L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
#if binf is active:
# A -> B -> C -> E -> I
# F -> G -> L
# H -> J -> K
# C1 -> A1 -> B1
# Z -> X
Name_relations.csv
Start_Name,Mid_Name_1,Mid_Name_2,End_Name
A,-,B,-
-,B,C,-
-,C,E,-
F,-,G,-
H,-,J,-
-,E,-,I
-,J,-,K
-,G,-,L
-,A1,-,B1
C1,-,A1,-
Z,-,-,X
来源:https://stackoverflow.com/questions/64206914/pandas-iterate-through-rows-column-and-print-it-based-on-some-condition