I have the following data in a csv file:
from StringIO import StringIO
import pandas as pd
the_data = \"\"\"
ABC,2016-6-9 0:00,95,{\'//PurpleCar\': [115L],
Edit: The file seems to be actually an escaped CSV so we don't need a custom parsing for this part.
As @Blckknght points out in the comment, the file is not a valid CSV. I'll make some assumptions in my answer. They are
First, some imports
import ast
import pandas as pd
We'll just split the rows by commas as we don't need to deal with any sort of CSV escaping (assumptions #1 and #2).
rows = (line.split(",", 3) for line in the_data.splitlines() if line.strip() != "")
fixed_columns = pd.DataFrame.from_records(rows, columns=["Company", "Date", "Value", "Cars_str"])
fixed_columns = pd.read_csv(..., names=["Company", "Date", "Value", "Cars_str"])
The first three columns are fixed and we leave them as they are. The last column we can parse with ast.literal_eval because it's a dict (assumption #3). This is IMO more readable and more flexible if the format changes than regex. Also you'll detect the format change earlier.
cars = fixed_columns["Cars_str"].apply(ast.literal_eval)
del fixed_columns["Cars_str"]
And this part answers rather your other question.
We prepare functions to process the keys and values of the dict so they fail if our assumptions about content of the dict fail.
def get_single_item(list_that_always_has_single_item):
v, = list_that_always_has_single_item
return v
def extract_car_name(car_str):
assert car_str.startswith("//"), car_str
return car_str[2:]
We apply the functions and construct pd.Series which allow us to...
dynamic_columns = cars.apply(
lambda x: pd.Series({
extract_car_name(k): get_single_item(v)
for k, v in x.items()
}))
...add the columns to the dataframe
result = pd.concat([fixed_columns, dynamic_columns], axis=1)
result
Finally, we get the table:
Company Date Value BlackCar BlueCar NPO-GreenCar PinkCar \
0 ABC 2016-6-9 0:00 95 NaN 16.0 NaN NaN
1 ABC 2016-6-10 0:00 0 NaN 90.0 NaN NaN
2 ABC 2016-6-11 0:00 0 NaN 31.0 NaN NaN
3 ABC 2016-6-12 0:00 0 NaN 8888.0 NaN NaN
4 ABC 2016-6-13 0:00 0 NaN 4.0 NaN NaN
5 DEF 2016-6-16 0:00 0 15.0 NaN 0.0 4.0
6 DEF 2016-6-17 0:00 0 15.0 NaN 0.0 4.0
7 DEF 2016-6-18 0:00 0 15.0 NaN 0.0 4.0
8 DEF 2016-6-19 0:00 0 15.0 NaN 0.0 4.0
9 DEF 2016-6-20 0:00 0 15.0 NaN 0.0 4.0
PurpleCar WhiteCar-XYZ YellowCar
0 115.0 0.0 403.0
1 219.0 0.0 381.0
2 817.0 0.0 21.0
3 80.0 0.0 2011.0
4 32.0 0.0 15.0
5 32.0 NaN NaN
6 32.0 NaN NaN
7 32.0 NaN NaN
8 32.0 NaN NaN
9 32.0 NaN NaN