I have the following DataFrame
, where Track ID
is the row index. How can I split the string in the stats
column into 5 columns of numb
Assuming you have a column which contains tuples (as it appears in your example) rather than strings, this will work:
df = pandas.DataFrame({'Track ID': [14, 28, 42], 'stats': [(1, 2, 3, 4, 5), (1, 2, 3, 4, 5), (1, 2, 3, 4, 5)]}).set_index("Track ID")
from operator import itemgetter
for i in range(5):
df["Col {}".format(i)] = df.stats.apply(itemgetter(i))
If you actually have strings that look like tuples, you can parse them first and then apply the same pattern as above:
df = df2 = pandas.DataFrame({'Track ID': [14, 28, 42], 'stats': ["(1, 2, 3, 4, 5)", "(1, 2, 3, 4, 5)", "(1, 2, 3, 4, 5)"]}).set_index("Track ID")
df.stats = df2.stats.str.strip("()").str.split(", ")
If you have a sequence of tuples and not strings, and you want them as DataFrame columns, this is the simplest approach:
df = pd.concat([df['Track ID'],pd.DataFrame(df['stats'].values.tolist())], axis=1)
If it is actually strings, you can first convert it to lists like so, then apply the above operation:
dfpart = pd.DataFrame(df['stats'].apply(lambda x: x.strip('()').split(', ')).values.tolist()).astype(float)
df = pd.concat([df['Track ID'], dfpart], axis=1)
And for the other case, assuming it are strings that look like tuples:
In [74]: df['stats'].str[1:-1].str.split(',', expand=True).astype(float)
Out[74]:
0 1 2 3 4
0 -0.009242 0.410000 -0.742016 0.003683 0.002517
1 0.041154 0.318231 0.758717 0.002640 0.010654
2 -0.014435 0.168438 -0.808703 0.000817 0.003166
3 0.034346 0.288731 0.950845 0.000001 0.003373
4 0.009052 0.151031 0.670257 0.012179 0.003022
5 -0.004797 0.171615 -0.552879 0.050032 0.002180
(note: for older versions of pandas (< 0.16.1), you need to use return_type='frame'
instead of the expand keyword)
By the way, if it are tuples and not strings, you can simply do the following:
pd.DataFrame(df['stats'].tolist(), index=df.index)