json_normalize JSON file with list containing dictionary (sample included)

白昼怎懂夜的黑 提交于 2019-12-20 04:58:35

问题


This is a sample json file I'm working with with 2 records:

[{"Time":"2016-01-10",
"ID"
:13567,
"Content":{
    "Event":"UPDATE",
    "Id":{"EventID":"ABCDEFG"},
    "Story":[{
        "@ContentCat":"News",
        "Body":"Related Meeting Memo: Engagement with target firm for potential M&A.  Please be on call this weekend for news updates.",
        "BodyTextType":"PLAIN_TEXT",
        "DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]},
        "DerivedTopics":{"Topics":[
                            {"Id":"Meeting","Score":70},
                            {"Id":"Performance","Score":70},
                            {"Id":"Engagement","Score":100},
                            {"Id":"Salary","Score":70},
                            {"Id":"Career","Score":100}]
                        },
        "HotLevel":0,
        "LanguageString":"ENGLISH",
        "Metadata":{"ClassNum":50,
                    "Headline":"Attn: Weekend",
                    "WireId":2035,
                    "WireName":"IIS"},
        "Version":"Original"}
                ]},
"yyyymmdd":"20160110",
"month":201601},
{"Time":"2016-01-12",
"ID":13568,
"Content":{
    "Event":"DEAL",
    "Id":{"EventID":"ABCDEFG2"},
    "Story":[{
        "@ContentCat":"Details",
        "Body":"Test email contents",
        "BodyTextType":"PLAIN_TEXT",
        "DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]},
        "DerivedTopics":{"Topics":[
                            {"Id":"Meeting","Score":70},
                            {"Id":"Engagement","Score":100},
                            {"Id":"Salary","Score":70},
                            {"Id":"Career","Score":100}]
                        },
        "HotLevel":0,
        "LanguageString":"ENGLISH",
        "Metadata":{"ClassNum":70,
                    "Headline":"Attn: Weekend",
                    "WireId":2037,
                    "WireName":"IIS"},
        "Version":"Original"}
                ]},
"yyyymmdd":"20160112",
"month":201602}]

I'm trying to get to a dataframe at the level of the entity IDs (extracting Amy and Jon from record 1 and Bob, Jon, Jack from record 2).

However I'm already getting an error early on. Here's my code so far, assuming the sample json is saved as sample.json:

data = json.load(open('sample.json'))
test = json_normalize(data, record_path=['Content', 'Story'])

Results in this error:

TypeError: string indices must be integers

I suspect it's because Content.Story is actually a list containing a dictionary, instead of dictionary itself. But it's not clear to me how to actually get past this?

EDIT: To clarify, I'm ultimately trying to get to the level of the entity IDs (Content > Story > DerivedID > Entity > Id). Was showing the Content.Story code example just to illustrate where I'm at right now in figuring this out.


回答1:


json_normalize(data, record_path=[['Content', 'Story']])

That should work.



来源:https://stackoverflow.com/questions/51236433/json-normalize-json-file-with-list-containing-dictionary-sample-included

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!