Reading multiple JSON records into a Pandas dataframe

前端 未结 4 797
粉色の甜心
粉色の甜心 2020-12-04 17:41

I\'d like to know if there is a memory efficient way of reading multi record JSON file ( each line is a JSON dict) into a pandas dataframe. Below is a 2 line example with wo

4条回答
  •  借酒劲吻你
    2020-12-04 18:02

    ++++++++Update++++++++++++++

    As of v0.19, Pandas supports this natively (see https://github.com/pandas-dev/pandas/pull/13351). Just run:

    df=pd.read_json('test.json', lines=True)
    

    ++++++++Old Answer++++++++++

    The existing answers are good, but for a little variety, here is another way to accomplish your goal that requires a simple pre-processing step outside of python so that pd.read_json() can consume the data.

    • Install jq https://stedolan.github.io/jq/.
    • Create a valid json file with cat test.json | jq -c --slurp . > valid_test.json
    • Create dataframe with df=pd.read_json('valid_test.json')

    In ipython notebook, you can run the shell command directly from the cell interface with

    !cat test.json | jq -c --slurp . > valid_test.json
    df=pd.read_json('valid_test.json')
    

提交回复
热议问题