Fast convert JSON column into Pandas dataframe

前端 未结 3 555
时光说笑
时光说笑 2020-12-08 11:38

I\'m reading data from a database (50k+ rows) where one column is stored as JSON. I want to extract that into a pandas dataframe. The snippet below works fine but is fairly

相关标签:
3条回答
  • 2020-12-08 11:59

    I think you can first convert string column data to dict, then create list of numpy arrays by values and last DataFrame.from_records:

    df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                     header=None, index_col=0, names=['data'])
    
    a = df.data.apply(json.loads).values.tolist() 
    print (pd.DataFrame.from_records(a))
    

    Another idea:

     df = pd.json_normalize(df['data'])
    
    0 讨论(0)
  • 2020-12-08 12:09

    json_normalize takes an already processed json string or a pandas series of such strings.

    pd.io.json.json_normalize(df.data.apply(json.loads))
    

    setup

    import pandas as pd
    import json
    
    df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                     header=None, index_col=0, names=['data'])
    
    0 讨论(0)
  • 2020-12-08 12:10

    data = { "events":[
    {
    "timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"Player is loading", "data":"" }, {
    "timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"Player is loaded", "data":"5" } ] }

    from pandas.io.json import json_normalize
    result = json_normalize(data,'events')
    print(result)
    
    0 讨论(0)
提交回复
热议问题