Converting Pandas DataFrame to JSON

随声附和 提交于 2020-12-27 05:53:26

问题


I've data stored in pandas dataframe and I want to convert tat into a JSON format. Example data can be replicated using following code

data = {'Product':['A', 'B', 'A'],
        'Zone':['E/A', 'A/N', 'E/A'],
        'start':['08:00:00', '09:00:00', '12:00:00'],
        'end':['12:30:00', '17:00:00', '17:40:00'],
        'seq':['0, 1, 2 ,3 ,4','0, 1, 2 ,3 ,4', '0, 1, 2 ,3 ,4'],
        'store':['Z',"'AS', 'S'", 'Z']
        }

df = pd.DataFrame(data)

I've tried converting it into JSON format using following code

df_parsed = json.loads(df.to_json(orient="records"))

Output generated from above

[{'Product': 'A', 'Zone': 'E/A', 'start': '08:00:00', 'end': '17:40:00', 'seq': '0, 1, 2 ,3 ,4', 'store': 'Z'}, {'Product': 'B', 'Zone': 'A/N', 'start': '09:00:00', 'end': '17:00:00', 'seq': '0, 1, 2 ,3 ,4', 'store': 'AS'}, {'Product': 'A', 'Zone': 'E/A', 'start': '08:00:00', 'end': '17:40:00', 'seq': '0, 1, 2 ,3 ,4', 'store': 'Z'}]

Desired Result:

{
'A': {'Zone': 'E/A', 
'tp': [{'start': [8, 0], 'end': [12, 0], 'seq': [0, 1, 2 ,3 ,4]},
      {'start': [12, 30], 'end': [17, 40], 'seq': [0, 1, 2 ,3 ,4]}],
      
'store': ['Z']
}, 
'B': {'Zone': 'A/N', 
'tp': [{'start': [9, 0], 'end': [17, 0], 'seq': [0, 1, 2 ,3 ,4]}],
      
'store': ['AS', 'S']
}
}

If a product belongs to same store the result for column start, end and seq should be clubbed as shown in desired output. Also start time and end time should be represented like [9,0] if value for time is "09:00:00" only hour and minute needs to be represented so we can discard value of seconds from time columns.


回答1:


This will be complicated a bit. So you have to do it step by step:

def funct(row):
    row['start'] = row['start'].str.split(':').str[0:2]
    row['end'] = row['end'].str.split(':').str[0:2]
    row['store'] = row['store'].str.replace("'", "").str.split(', ')

    d = (row.groupby('Zone')[row.columns[1:]]
        .apply(lambda x: x.to_dict(orient='record'))
        .reset_index(name='tp').to_dict(orient='row'))
    return d

di = df.groupby(['Product'])[df.columns[1:]].apply(funct).to_dict()

di:

{'A': [{'Zone': 'E/A',
   'tp': [{'start': ['08', '00'],
     'end': ['12', '30'],
     'seq': '0, 1, 2 ,3 ,4',
     'store': ['Z']},
    {'start': ['12', '00'],
     'end': ['17', '40'],
     'seq': '0, 1, 2 ,3 ,4',
     'store': ['Z']}]}],
 'B': [{'Zone': 'A/N',
   'tp': [{'start': ['09', '00'],
     'end': ['17', '00'],
     'seq': '0, 1, 2 ,3 ,4',
     'store': ['AS', 'S']}]}]}

Explanation:

  • 1st create your own custom function.
  • change the start, end column to a list form.
  • group by Zone and apply to_dict to rest of the columns.
  • reset index and name the column that are having [{'start': ['08', '00'], 'end': ['12', '30'], 'seq': '0, 1, 2 ,3 ,4', as tp.
  • now apply to_dict to the whole result and return it.

Ultimately you need to convert your dataframe into this below format once you are able to do it the rest of the thing will become easy for you.

Zone    tp
E/A    [{'start': ['08', '00'], 'end': ['12', '30'], ...
A/N    [{'start': ['09', '00'], 'end': ['17', '00'], ... 

EDIT:

import pandas as pd
import ast

def funct(row):
    y = row['start'].str.split(':').str[0:-1]
    row['start'] = row['start'].str.split(':').str[0:2].apply(lambda x: list(map(int, x)))
    row['end'] = row['end'].str.split(':').str[0:2].apply(lambda x: list(map(int, x)))
    row['seq'] = row['seq'].apply(lambda x: list(map(int, ast.literal_eval(x))))
    row['store'] = row['store'].str.replace("'", "")

    d = (row.groupby('Zone')[row.columns[1:-1]]
        .apply(lambda x: x.to_dict(orient='record'))
        .reset_index(name='tp'))
    ######### For store create a different dataframe and then merge it to the other df ########
    d1 = (row.groupby('Zone').agg({'store': pd.Series.unique}))
    d1['store'] = d1['store'].str.split(",")
    d_merged = (pd.merge(d,d1, on='Zone', how='left')).to_dict(orient='record')[0]
    return d_merged

di = df.groupby(['Product'])[df.columns[1:]].apply(funct).to_dict()

di:

{'A': {'Zone': 'E/A',
  'tp': [{'start': [8, 0], 'end': [12, 30], 'seq': [0, 1, 2, 3, 4]},
   {'start': [12, 0], 'end': [17, 40], 'seq': [0, 1, 2, 3, 4]}],
  'store': ['Z']},
 'B': {'Zone': 'A/N',
  'tp': [{'start': [9, 0], 'end': [17, 0], 'seq': [0, 1, 2, 3, 4]}],
  'store': ['AS', ' S']}}
 


来源:https://stackoverflow.com/questions/65357356/converting-pandas-dataframe-to-json

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!