问题
I have a text file from Kaggle of Clash Royale stats. It's in a format of Python Dictionaries. I am struggling to find out how to read that into a file in a meaningful way. Curious what the best way is to do this. It's a fairly complex Dict with Lists.
Original Dataset here: https://www.kaggle.com/s1m0n38/clash-royale-matches-dataset
{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'}
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '@LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}
回答1:
I saved you data to .json
files, then just needed to loop through each line and treat it as it's own JSON file, then I used pandas.io.json.json_normalize to load it into a DataFrame
and I made some guesses at how you wanted the df to look but I came up with this:
note: proper JSON
needs to have double quotes not single so I used replace to work around this. Be careful that no data inside is destryed using this.
note: The way I got this to work, I had to merge 'right'
and 'left'
so you are losing this data. If this is needed you could use a dict comp as a workaround
import json
import pandas as pd
from pandas.io.json import json_normalize
with open('cr.json', 'r') as f:
df = None
for line in f:
data = json.loads(line.replace("'", '"'))
#needed to put the right and left keys together, maybe you can find a way around this, I wasn't
df1 = json_normalize([data['players']['right'], data['players']['left']],
'deck',
['name', 'trophy', 'clan'],
meta_prefix='player.',
errors='ignore')
df = pd.concat([df, df1])
df.rename(columns={0: 'player.troop.name', 1: 'player.troop.level'},
inplace=True)
print(df)
This prints:
player.troop.name player.troop.level player.name player.clan \
0 Mega Minion 9 gpa raid TwoFiveOne
1 Electro Wizard 3 gpa raid TwoFiveOne
2 Arrows 11 gpa raid TwoFiveOne
3 Lightning 5 gpa raid TwoFiveOne
4 Tombstone 9 gpa raid TwoFiveOne
5 The Log 2 gpa raid TwoFiveOne
6 Giant 9 gpa raid TwoFiveOne
7 Bowler 5 gpa raid TwoFiveOne
8 Fireball 9 Supr4 battusai
9 Archers 12 Supr4 battusai
10 Goblins 12 Supr4 battusai
11 Minions 11 Supr4 battusai
12 Bomber 12 Supr4 battusai
13 The Log 2 Supr4 battusai
14 Barbarians 12 Supr4 battusai
15 Royal Giant 13 Supr4 battusai
0 Ice Spirit 10 TITAN The Wolves
1 Valkyrie 9 TITAN The Wolves
2 Hog Rider 9 TITAN The Wolves
3 Inferno Tower 9 TITAN The Wolves
4 Goblins 12 TITAN The Wolves
5 Musketeer 9 TITAN The Wolves
6 Zap 12 TITAN The Wolves
7 Fireball 9 TITAN The Wolves
8 Royal Giant 13 Supr4 battusai
9 Ice Wizard 2 Supr4 battusai
10 Bomber 12 Supr4 battusai
11 Knight 12 Supr4 battusai
12 Fireball 9 Supr4 battusai
13 Barbarians 12 Supr4 battusai
14 The Log 2 Supr4 battusai
15 Archers 12 Supr4 battusai
0 Miner 3 Victor @LA PERLA NEGRA
1 Ice Golem 9 Victor @LA PERLA NEGRA
2 Spear Goblins 12 Victor @LA PERLA NEGRA
3 Minion Horde 12 Victor @LA PERLA NEGRA
4 Inferno Tower 8 Victor @LA PERLA NEGRA
5 The Log 2 Victor @LA PERLA NEGRA
6 Skeleton Army 6 Victor @LA PERLA NEGRA
7 Fireball 10 Victor @LA PERLA NEGRA
8 Royal Giant 13 Supr4 battusai
9 Ice Wizard 2 Supr4 battusai
10 Bomber 12 Supr4 battusai
11 Knight 12 Supr4 battusai
12 Fireball 9 Supr4 battusai
13 Barbarians 12 Supr4 battusai
14 The Log 2 Supr4 battusai
15 Archers 12 Supr4 battusai
player.trophy
0 4258
1 4258
2 4258
3 4258
4 4258
5 4258
6 4258
7 4258
8 4325
9 4325
10 4325
11 4325
12 4325
13 4325
14 4325
15 4325
0 4237
1 4237
2 4237
3 4237
4 4237
5 4237
6 4237
7 4237
8 4296
9 4296
10 4296
11 4296
12 4296
13 4296
14 4296
15 4296
0 4300
1 4300
2 4300
3 4300
4 4300
5 4300
6 4300
7 4300
8 4267
9 4267
10 4267
11 4267
12 4267
13 4267
14 4267
15 4267
And df.iloc[0]
is as follows:
player.troop.name Mega Minion
player.troop.level 9
player.name gpa raid
player.trophy 4258
player.clan TwoFiveOne
Name: 0, dtype: object
You can rework the json_normalize
paramaters how you see fit, but I hope this is more than enough to get you going
回答2:
According to this dataset's synopsis on kaggle, each dictionary represents a match between two players. I felt it would make sense to have each row in the dataframe represent all the characteristics of a single match.
This can be accomplished in a few short steps.
- Store all the match dictionaries (each row of the dataset from kaggle) inside one list:
matches = [
{'players': {'right': {'deck': [['Mega Minion', '9'], ['Electro Wizard', '3'], ['Arrows', '11'], ['Lightning', '5'], ['Tombstone', '9'], ['The Log', '2'], ['Giant', '9'], ['Bowler', '5']], 'trophy': '4258', 'clan': 'TwoFiveOne', 'name': 'gpa raid'}, 'left': {'deck': [['Fireball', '9'], ['Archers', '12'], ['Goblins', '12'], ['Minions', '11'], ['Bomber', '12'], ['The Log', '2'], ['Barbarians', '12'], ['Royal Giant', '13']], 'trophy': '4325', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['2', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Ice Spirit', '10'], ['Valkyrie', '9'], ['Hog Rider', '9'], ['Inferno Tower', '9'], ['Goblins', '12'], ['Musketeer', '9'], ['Zap', '12'], ['Fireball', '9']], 'trophy': '4237', 'clan': 'The Wolves', 'name': 'TITAN'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4296', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['1', '0'], 'time': '2017-07-12'},
{'players': {'right': {'deck': [['Miner', '3'], ['Ice Golem', '9'], ['Spear Goblins', '12'], ['Minion Horde', '12'], ['Inferno Tower', '8'], ['The Log', '2'], ['Skeleton Army', '6'], ['Fireball', '10']], 'trophy': '4300', 'clan': '@LA PERLA NEGRA', 'name': 'Victor'}, 'left': {'deck': [['Royal Giant', '13'], ['Ice Wizard', '2'], ['Bomber', '12'], ['Knight', '12'], ['Fireball', '9'], ['Barbarians', '12'], ['The Log', '2'], ['Archers', '12']], 'trophy': '4267', 'clan': 'battusai', 'name': 'Supr4'}}, 'type': 'ladder', 'result': ['0', '1'], 'time': '2017-07-12'}
]
- Create a dataframe from the above list, which will automatically populate columns that contain info for the
type
,time
, andresult
of the match:
df = pd.DataFrame(matches)
- Then, use some simple logic to populate columns containing info on the
deck
,trophy
,clan
, andname
of both the left and right players in the match:
sides = ['right', 'left']
player_keys = ['deck', 'trophy', 'clan', 'name']
for side in sides:
for key in player_keys:
for i, row in df.iterrows():
df[side + '_' + key] = df['players'].apply(lambda x: x[side][key])
df = df.drop('players', axis=1) # no longer need this after populating the other columns
df = df.iloc[:, ::-1] # made sense to display columns in order of player info from left to right,
# followed by general match info at the far right of the dataframe
The resulting dataframe looks like this:
left_name left_clan left_trophy left_deck right_name right_clan right_trophy right_deck type time result
0 Supr4 battusai 4325 [[Fireball, 9], [Archers, 12], [Goblins, 12], ... gpa raid TwoFiveOne 4258 [[Mega Minion, 9], [Electro Wizard, 3], [Arrow... ladder 2017-07-12 [2, 0]
1 Supr4 battusai 4296 [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ... TITAN The Wolves 4237 [[Ice Spirit, 10], [Valkyrie, 9], [Hog Rider, ... ladder 2017-07-12 [1, 0]
2 Supr4 battusai 4267 [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, ... Victor @LA PERLA NEGRA 4300 [[Miner, 3], [Ice Golem, 9], [Spear Goblins, 1... ladder 2017-07-12 [0, 1]
回答3:
- Given your sample in a file called
test.txt
, which will be rows of dictionaries.- The data is not a
JSON
format and does not need to be converted to that format.
- The data is not a
- Read the file in, which will make each row a
str
type - Convert it from
str
todict
type with ast.literal_eval - Convert the
list
ofdicts
to a dataframe with pandas.json_normalize
import pandas as pd
from ast import literal_eval
with open('test.txt', 'r', encoding='utf-8') as f: # read in the file
list_of_rows = [literal_eval(row) for row in f.readlines()] # use a list comprehesion to convert each row from str to dict
# convert to a dataframe
df = pd.json_normalize(list_of_rows)
# display(df)
type result time players.right.deck players.right.trophy players.right.clan players.right.name players.left.deck players.left.trophy players.left.clan players.left.name
0 ladder [2, 0] 2017-07-12 [[Mega Minion, 9], [Electro Wizard, 3], [Arrows, 11], [Lightning, 5], [Tombstone, 9], [The Log, 2], [Giant, 9], [Bowler, 5]] 4258 TwoFiveOne gpa raid [[Fireball, 9], [Archers, 12], [Goblins, 12], [Minions, 11], [Bomber, 12], [The Log, 2], [Barbarians, 12], [Royal Giant, 13]] 4325 battusai Supr4
1 ladder [1, 0] 2017-07-12 [[Ice Spirit, 10], [Valkyrie, 9], [Hog Rider, 9], [Inferno Tower, 9], [Goblins, 12], [Musketeer, 9], [Zap, 12], [Fireball, 9]] 4237 The Wolves TITAN [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, 12], [Knight, 12], [Fireball, 9], [Barbarians, 12], [The Log, 2], [Archers, 12]] 4296 battusai Supr4
2 ladder [0, 1] 2017-07-12 [[Miner, 3], [Ice Golem, 9], [Spear Goblins, 12], [Minion Horde, 12], [Inferno Tower, 8], [The Log, 2], [Skeleton Army, 6], [Fireball, 10]] 4300 @LA PERLA NEGRA Victor [[Royal Giant, 13], [Ice Wizard, 2], [Bomber, 12], [Knight, 12], [Fireball, 9], [Barbarians, 12], [The Log, 2], [Archers, 12]] 4267 battusai Supr4
来源:https://stackoverflow.com/questions/54489615/how-to-read-a-text-file-of-dictionaries-into-a-dataframe