I\'m reading data from a database (50k+ rows) where one column is stored as JSON. I want to extract that into a pandas dataframe. The snippet below works fine but is fairly
I think you can first convert string
column data
to dict
, then create list
of numpy arrays
by values and last DataFrame.from_records:
df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
header=None, index_col=0, names=['data'])
a = df.data.apply(json.loads).values.tolist()
print (pd.DataFrame.from_records(a))
Another idea:
df = pd.json_normalize(df['data'])
json_normalize takes an already processed json string or a pandas series of such strings.
pd.io.json.json_normalize(df.data.apply(json.loads))
setup
import pandas as pd
import json
df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
header=None, index_col=0, names=['data'])
data = { "events":[
{
"timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"Player is loading", "data":"" }, {
"timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"Player is loaded", "data":"5" } ] }
from pandas.io.json import json_normalize
result = json_normalize(data,'events')
print(result)