Splitting Rows in csv on several header rows

前端 未结 2 1422
一向
一向 2021-01-03 15:52

I am very new to python, so please be gentle.

I have a .csv file, reported to me in this format, so I cannot do much about it:

ClientAccountID   Acc         


        
2条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-03 16:22

    If your data was not comma or tab delimited you could use str.split, you can combine it with itertools.groupby to delimit the headers and rows:

    from itertools import groupby, izip, imap
    
    with open("test.txt") as f:
        grps, data = groupby(imap(str.split, f), lambda x: x[0] == "ClientAccountID"), []
        for k, v in grps:
            if k:
                names = next(v)
                vals = izip(*next(grps)[1])
                data.append(dict(izip(names, vals)))
    
    from pprint import pprint as pp
    
    pp(data)
    

    Output:

    [{'AccountAlias': ('SomeAlias', 'OtherAlias'),
      'ClientAccountID': ('SomeID', 'OtherID'),
      'CurrencyPrimary': ('SomeCurr', 'OtherCurr'),
      'FromDate': ('SomeDate', 'OtherDate')},
     {'AccountAlias': ('SomeAlias', 'OtherAlias', 'AnotherAlias'),
      'AssetClass': ('SomeClass', 'OtherDate', 'AnotherDate'),
      'ClientAccountID': ('SomeID', 'OtherID', 'AnotherID'),
      'CurrencyPrimary': ('SomeCurr', 'OtherCurr', 'AnotherCurr')}]
    

    If it is tab delimited just change one line:

    with open("test.txt") as f:
        grps, data = groupby(csv.reader(f, delimiter="\t"), lambda x: x[0] == "ClientAccountID"), []
        for k, v in grps:
            if k:
                names = next(v)
                vals = izip(*next(grps)[1])
                data.append(dict(izip(names, vals)))
    

提交回复
热议问题