pandas - convert string into list of strings

前端未结

关注

 6  1818

I have this \'file.csv\' file to read with pandas:

Title|Tags
T1|\"[Tag1,Tag2]\"
T1|\"[Tag1,Tag2,Tag3]\"
T2|\"[Tag3,Tag1]\"

using

相关标签:

6条回答

一个人的身影

2020-12-15 23:17
Your df['Tags'] appears to be a list of strings. If you print that list you should get ["[tag1,tag2]","[Tag1,Tag2,Tag3]","[Tag3,Tag1]"] this is why when you call the first element of the first element you're actually getting the first single character of the string, rather than what you want.

You either need to parse that string afterward. Performing something like
```
df['Tags'][0] = df['Tags'][0].split(',')
```
But as you saw in your cited example this will give you a list that looks like
```
in: df['Tags'][0][0] 
out: '[tag1'`
```
What you need is a way to parse the string editing out multiple characters. You can use a simple regex expression to do this. Something like:
```
 import re
 df['Tags'][0] = re.findall(r"[\w']+", df['Tags'][0])
 print(df['Tags'][0][0])
```
will print:
```
 'tag1'
```
Using the other answer involving Pandas converters you might write a converter like this:
```
 def clean(seq_string):
      return re.findall(r"[\w']+", seq_string)
```
If you don't know regex, they can be quite powerful, but also unpredictable if you're not sure on the content of your input strings. The expression used here r"[\w']+" will match any common word character alpha-numeric and underscores and treat everything else as a point for re.findall to split the list at.
0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2020-12-15 23:25
You can split the string manually:
```
>>> df['Tags'] = df.Tags.apply(lambda x: x[1:-1].split(','))
>>> df.Tags[0]
['Tag1', 'Tag2']
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-12-15 23:27
You can convert the string to a list using strip and split.
```
df_out = df.assign(Tags=df.Tags.str.strip('[]').str.split(','))

df_out.Tags[0][0]
```
Output:
```
'Tag1'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-12-15 23:29
Here's a simple yet performant operation:
```
df['Tags'].str.split(',')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2020-12-15 23:38
Or
```
df.Tags=df.Tags.str[1:-1].str.split(',').tolist()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2020-12-15 23:44
I think you could use the json module.
```
import json
import pandas

df = pd.read_csv('file.csv', sep='|')
df['Tags'] = df['Tags'].apply(lambda x: json.loads(x))
```
So this will load your dataframe as before, then apply a lambda function to each of the items in the Tags column. The lambda function calls json.loads() which converts the string representation of the list to an actual list.
0 讨论(0)
发布评论:

提交评论
- 加载中...