I have a string list:
content
01/09/15, 10:07 - message1
01/09/15, 10:32 - message2
01/09/15, 10:44 - message3
I want a data frame, like:
You can use str.extract
- where named groups can become column names
In [5827]: df['content'].str.extract('(?P<date>[\s\S]+) - (?P<message>[\s\S]+)',
expand=True)
Out[5827]:
date message
0 01/09/15, 10:07 message1
1 01/09/15, 10:32 message2
2 01/09/15, 10:44 message3
Details
In [5828]: df
Out[5828]:
content
0 01/09/15, 10:07 - message1
1 01/09/15, 10:32 - message2
2 01/09/15, 10:44 - message3
Use str.split by \s+-\s+
- \s+
is one or more whitespaces:
df[['date','message']] = df['content'].str.split('\s+-\s+', expand=True)
print (df)
content date message
0 01/09/15, 10:07 - message1 01/09/15, 10:07 message1
1 01/09/15, 10:32 - message2 01/09/15, 10:32 message2
2 01/09/15, 10:44 - message3 01/09/15, 10:44 message3
If need remove content
column add DataFrame.pop:
df[['date','message']] = df.pop('content').str.split('\s+-\s+', expand=True)
print (df)
date message
0 01/09/15, 10:07 message1
1 01/09/15, 10:32 message2
2 01/09/15, 10:44 message3