How to pack multilines results (OrderedDict with differents keys) into pandas dataframe? [duplicate]

China☆狼群 提交于 2021-01-29 15:34:14

问题


I did a loop over multiple files to get as a result an OrderedDict for each one (possibility to have different keys, which means not the same from an OrderedDict to another). I want to write/ pack the result into the same pandas dataframe. So we will have all the different keys as column names, and each row of the dataframe will represent an OrderedDict.

I have already my results looking like this :

OrderedDict([('mrz_type', 'ID'), ('valid_score', 70), ('valid_composite', False), ('type', 'ID'), ('country', ''), ('number', ''), ('date_of_birth', '840927'), ('sex', 'F'), ('nom', ), ('prenom', ''), ('dep', ''), ('service', '1'), ('office', '056'), ('check_number', '7'), ('check_date_of_birth', '4'), ('check_composite', '9'), ('valid_number', True), ('valid_date_of_birth', True)])

OrderedDict([('mrz_type', 'PASP'), ('valid_score', 62), ('valid_composite', False), ('type', 'P'), ('country', ''), ('number', ''), ('date_of_birth', '550912'), ('expiration_date', '200801'), ('nationality', ''), ('sex', 'M'), ('nom', ''), ('prenom', ''), ('check_number', '2'), ('check_date_of_birth', '9'), ('check_expiration_date', '1'), ('check_composite', '8'), ('valid_number', True), ('valid_date_of_birth', False), ('valid_expiration_date', True)])

OrderedDict([('mrz_type', 'IR'), ('valid_score', 28), ('valid_composite', False), ('type', 'IR'), ('country', ''), ('number', ''), ('date_of_birth', '750612'), ('expiration_date', '010119'), ('nationality', ''), ('sex', 'Z'), ('nom', ''), ('prenom', ''), ('num_etrg', ''), ('check_number', '6'), ('check_date_of_birth', '1'), ('check_expiration_date', ''), ('check_composite', ''), ('valid_number', False), ('valid_date_of_birth', True), ('valid_expiration_date', False)])

回答1:


following your the 3 examples of OrderedDicts you provided Note that the first OrderedDict you provided has a single element tuple ('nom', ) which I changed to ('nom',' ').

following 4 steps to achieve desired result:

list_of_ordered_dicts = [od1,od2,od3]

# flat all items for dicts together in a list
all_items =[item for dict_items in list_of_ordered_dicts
                     for item in dict_items.items() ]

# create a set with all columns 
all_columns = set(dict(all_items).keys())

# update ordered dicts with new columns, setting missing columns values to None
for ordered_dict in list_of_ordered_dicts:
    missing_columns = all_columns - set(ordered_dict.keys())
    for column in missing_columns:
        ordered_dict.setdefault(column, None)

# create dataframe
df = pd.DataFrame(list_of_ordered_dicts, columns=all_columns)


来源:https://stackoverflow.com/questions/57061143/how-to-pack-multilines-results-ordereddict-with-differents-keys-into-pandas-da

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!