pandas Iterate through Rows & Column and print it based on some condition

谁都会走 提交于 2021-01-28 11:19:40

问题


I have an excel file & I Processed that file for Data Analysis and Created a Data Frame(Pandas).
Now I Need to Get the result , I'm trying to get it through iterating over pandas columns and rows using for & if Condition But I'm not getting desired output.
I've Taken hyphen(-) in excel file so that I can apply some conditions.

Excel File Input_File

Required Output
A -> B -> C -> E -> I
F -> G ->L
H -> J -> K 
A1 -> B1
C1 -> A1
Z -> X
Note: Saving Output in Text file in plain. No need of Graph / Visualization

code

df =  pd.read_excel('Test.xlsx')
df.fillna('-')
     
# Below code answer Z -> X
for index, row in df.iterrows():
    if row['Start_Name'] != '-':
        if row['End_Name'] != '-':
            print(row['Start_Name'] +' -> '+ row['End_Name'])

# Below code answer A -> B / F -> G / H -> J / C1 -> A1     
for index, row in df.iterrows():
    if row['Start_Name'] != '-':
        if row['Mid_Name_1'] == '-':
            if row['Mid_Name_2'] != '-':
                print(row['Start_Name'] +' -> '+ row['Mid_Name_2'])

# Below code answer B -> C /  C -> E
for index, row in df.iterrows():
    if row['Mid_Name_1'] != '-':
        if row['Mid_Name_2'] != '-':
            print(row['Mid_Name_1'] +' -> '+ row['Mid_Name_2'])


回答1:


Setup:

Fronts dictionary holds value/position of the sequence that stars with name/key.

Backs dictionary holds value/position of the sequence that ends with name/key.

sequences is a list to hold all combined relations.

position_counter stores position of last made sequence.

from collections import deque
import pandas as pd

data = pd.read_csv("Names_relations.csv")

fronts = dict()
backs = dict()

sequences = []
position_counter = 0

Extract_all. For each row select values that match regex-pattern

selector = data.apply(lambda row: row.str.extractall("([\w\d]+)"), axis=1)

For each relation from selector get extracted elements.

Put them to the queue.

Check if front of new relation can be attached to any previous sequence.

If so:

  1. take position of that sequence.
  2. take sequence itself as llist2
  3. remove last duplicated element from llist2
  4. add the sequences
  5. update sequences with connected llists
  6. update backs with position of the current end of the seuquence
  7. and finally remove exausted ends of the previous sequence from fronts and backs

Analogous to back in fronts.keys():

If no already existing sequence match to new relation:

  1. save that relation
  2. update fronts and backs with position of that relation
  3. update position counter
for relation in selector:
    front, back = relation[0]
    llist = deque((front, back))

    finb =  front in backs.keys()
#     binf = back in fronts.keys()

    if finb:
        position = backs[front]
        llist2 = sequences[position]
        back_llist2 = llist2.pop()
        llist = llist2 + llist
        sequences[position] = llist
        backs[llist[-1]] = position
        if front in fronts.keys():
            del fronts[front]
        if back_llist2 in backs.keys():
            del backs[back_llist2]

#     if binf:
#         position = fronts[back]
#         llist2 = sequences[position]
#         front_llist2 = llist2.popleft()
#         llist = llist + llist2
#         sequences[position] = llist
#         fronts[llist[0]] = position
#         if back in backs.keys():
#             del backs[back]
#         if front_llist2 in fronts.keys():
#             del fronts[front_llist2]

#     if not (finb or binf):
    if not finb: #(equivalent to 'else:')
        sequences.append(llist)
        fronts[front] = position_counter
        backs[back] = position_counter
        position_counter += 1

for s in sequences:
    print(' -> '.join(str(el) for el in s))

Outputs:

A -> B -> C -> E -> I
F -> G -> L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X

#if binf is active:
# A -> B -> C -> E -> I
# F -> G -> L
# H -> J -> K
# C1 -> A1 -> B1
# Z -> X

Name_relations.csv

Start_Name,Mid_Name_1,Mid_Name_2,End_Name
A,-,B,-
-,B,C,-
-,C,E,-
F,-,G,-
H,-,J,-
-,E,-,I
-,J,-,K
-,G,-,L
-,A1,-,B1
C1,-,A1,-
Z,-,-,X


来源:https://stackoverflow.com/questions/64206914/pandas-iterate-through-rows-column-and-print-it-based-on-some-condition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!