问题
I would like to convert a list, that appears to be a list of dictionaries (and with other lists inside it) to a pandas dataframe.
Here is a sample of my data:
['b"{',
'n boxers: [',
'n {',
'n age: 30,',
'n hasBoutScheduled: true,',
'n id: 489762,',
'n last6: [Array],',
"n name: 'Andy Ruiz Jr',",
'n points: 754,',
'n rating: 100,',
'n record: [Object],',
'n residence: [Object],',
"n stance: 'orthodox'",
'n },',
'n {',
'n age: 34,',
'n hasBoutScheduled: true,',
'n id: 468841,',
'n last6: [Array],',
"n name: 'Deontay Wilder',",
'n points: 622,',
'n rating: 100,',
'n record: [Object],',
'n residence: [Object],',
"n stance: 'orthodox'",
'n },',
'n {',
'n age: 30,',
'n hasBoutScheduled: true,',
'n id: 659461,',
'n last6: [Array],',
"n name: 'Anthony Joshua',",
'n points: 603,',
'n rating: 100,',
'n record: [Object],',
'n residence: [Object],',
"n stance: 'orthodox'",
'n },'
This is what I have tried thus far:
pd.DataFrame.from_records(unclean_file)
This produces about 27 columns - presumably a column for every space break, comma etc.
I have also tried using ChainMap from collections import ChainMap
pd.DataFrame.from_dict(ChainMap(*unclean_file),orient='index',columns=['age','hasBoutScheduled','id','last6','name','points','rating','record','residence','stance'])
This produces the error message: ValueError: dictionary update sequence element #0 has length 1; 2 is required
Note: When I extracted the data I converted it to a list- to clarify I am using the naked package to run a node.js file that returns json output which I then save to the variable success, initially in bytes string format then converted to a list:
success = muterun_js('index.js')
unclean_file = [str(success.stdout).split('\\')]
回答1:
You're reading in data in json format, so it would make more sense to use unclean_file = json.loads(success)
instead of unclean_file = [str(success.stdout).split('\\')]
.
This should return you a dict object which you can directly insert into a DataFrame.
Furthermore you might need to decode your data.
import json
import pandas as pd
success= success.decode('utf-8') # decode your content. Might not be necessary.
unclean_file = json.loads(success)
data = pd.DataFrame(unclean_file , index=[0])
回答2:
Splitting the data string doesn't help - it makes it even harder to parse.
error message: JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 3 (char 4)
This clearly says that one problem are the unquoted keys; further problems are the unquoted values true
, Array
and Object
. But it's not so hard to rectify all this:
unclean_string = success.stdout.decode()
import re
clean_string = re.sub(r'\w+(?=[],:])', r'"\g<0>"', unclean_string)
The above quotes all identifiers which are followed by :
, ,
or ]
, and we get a well-formed dict
representation, which we can evaluate and make a DataFrame
of:
pd.DataFrame(eval(clean_string)['boxers'])
来源:https://stackoverflow.com/questions/58692255/converting-list-containing-other-lists-and-dictionaries-into-a-pandas-dataframe