I have the following data in a csv file:
from StringIO import StringIO
import pandas as pd
the_data = \"\"\"
ABC,2016-6-9 0:00,95,{\'//PurpleCar\': [115L],
I think it's better to conver the strings into two columns:
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(the_data), sep=',', header=None)
df.columns = ['Company','Date','Volume','Car1','Car2','Car3','Car4']
cars = ["Car1", "Car2", "Car3", "Car4"]
pattern = r"//(?P.+?)':.*?(?P\d+)"
df2 = pd.concat([df[col].str
.extract(pattern)
.assign(value=lambda self: pd.to_numeric(self["value"]))
for col in cars],
axis=1, keys=cars)
the result:
Car1 Car2 Car3 Car4
color value color value color value color value
0 PurpleCar 115 YellowCar 403 BlueCar 16 WhiteCar-XYZ 0
1 PurpleCar 219 YellowCar 381 BlueCar 90 WhiteCar-XYZ 0
2 PurpleCar 817 YellowCar 21 BlueCar 31 WhiteCar-XYZ 0
3 PurpleCar 80 YellowCar 2011 BlueCar 8888 WhiteCar-XYZ 0
4 PurpleCar 32 YellowCar 15 BlueCar 4 WhiteCar-XYZ 0
5 PurpleCar 32 BlackCar 15 PinkCar 4 NPO-GreenCar 0
6 PurpleCar 32 BlackCar 15 PinkCar 4 NPO-GreenCar 0
7 PurpleCar 32 BlackCar 15 PinkCar 4 NPO-GreenCar 0
8 PurpleCar 32 BlackCar 15 PinkCar 4 NPO-GreenCar 0
9 PurpleCar 32 BlackCar 15 PinkCar 4 NPO-GreenCar 0