Using str in split in pandas

寵の児 提交于 2020-08-27 04:53:16

问题


Here is some dummy data I have created for my question. I have two questions regarding this:

  1. Why is split working by using str in the first part of the query and not in the second part?
  2. How come [0] is picking up the first row in part 1 and the first element from each row in part 2?

chess_data = pd.DataFrame({"winner": ['A:1','A:2','A:3','A:4','B:1','B:2']})

chess_data.winner.str.split(":")[0]
['A', '1']

chess_data.winner.map(lambda n: n.split(":")[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

回答1:


  • chess_data is a dataframe
  • chess_data.winner is a series
  • chess_data.winner.str is an accessor to methods that are string specific and optimized (to a degree)
  • chess_data.winner.str.split is one such method
  • chess_data.winner.map is a different method that takes a dictionary or a callable object and either calls that callable with each element in the series or calls the dictionaries get method on each element of the series.

In the case of using chess_data.winner.str.split Pandas does do a loop and performs a kind of str.split. While map is a more crude way of doing the same thing.


With your data.

chess_data.winner.str.split(':')

0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

In order to get each first element, you'll want to use the string accessor again

chess_data.winner.str.split(':').str[0]

0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

This is the equivalent way of performing what you had done in your map

chess_data.winner.map(lambda x: x.split(':')[0])

You could have also used a comprehension

chess_data.assign(new_col=[x.split(':')[0] for x in chess_data.winner])

  winner new_col
0    A:1       A
1    A:2       A
2    A:3       A
3    A:4       A
4    B:1       B
5    B:2       B



回答2:


Your code,

chess_data['winner'].str.split(':')[0] 
['A', '1']

Is the same as,

chess_data['winner'].str.split(':').loc[0] 
['A', '1']

And,

chess_data['winner'].map(lambda n: n.split(':')[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

Is the same as,

chess_data.winner.str.split(':').str[0]
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

Which is also the same as,

pd.Series([x.split(':')[0] for x in chess_data['winner']], name='winner') 
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object



回答3:


It is explained in the documentation under Indexing using str

.str[index] notation indexes the string by position where as [index] will slice based on the index of the series.

Using the example

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])

s.str[3]

returns the element at index 3 at each row

0    NaN
1    NaN
2    NaN
3      a
4      a
5    NaN
6      A
7    NaN
8    NaN

Whereas

s[3]

returns

'Aaba'



回答4:


Use apply method to extract first value from the splitted Series

chess_data.winner.str.split(':')
Out: 
0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

chess_data.winner.str.split(':').apply(lambda x: x[0])
Out:
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

When you use

chess_data.winner.str.split(":")[0] 

you just get fist item from the resulting series. But .apply() applies some function, in this case, 'itemgetter', to all the values in the series and returns another series.



来源:https://stackoverflow.com/questions/51911933/using-str-in-split-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!