Using str in split in pandas

问题

Here is some dummy data I have created for my question. I have two questions regarding this:

Why is split working by using str in the first part of the query and not in the second part?
How come [0] is picking up the first row in part 1 and the first element from each row in part 2?

chess_data = pd.DataFrame({"winner": ['A:1','A:2','A:3','A:4','B:1','B:2']})

chess_data.winner.str.split(":")[0]
['A', '1']

chess_data.winner.map(lambda n: n.split(":")[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

回答1:

chess_data is a dataframe
chess_data.winner is a series
chess_data.winner.str is an accessor to methods that are string specific and optimized (to a degree)
chess_data.winner.str.split is one such method
chess_data.winner.map is a different method that takes a dictionary or a callable object and either calls that callable with each element in the series or calls the dictionaries get method on each element of the series.

In the case of using chess_data.winner.str.split Pandas does do a loop and performs a kind of str.split. While map is a more crude way of doing the same thing.

With your data.

chess_data.winner.str.split(':')

0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

In order to get each first element, you'll want to use the string accessor again

chess_data.winner.str.split(':').str[0]

0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

This is the equivalent way of performing what you had done in your map

chess_data.winner.map(lambda x: x.split(':')[0])

You could have also used a comprehension

chess_data.assign(new_col=[x.split(':')[0] for x in chess_data.winner])

  winner new_col
0    A:1       A
1    A:2       A
2    A:3       A
3    A:4       A
4    B:1       B
5    B:2       B

回答2:

Your code,

chess_data['winner'].str.split(':')[0] 
['A', '1']

Is the same as,

chess_data['winner'].str.split(':').loc[0] 
['A', '1']

And,

chess_data['winner'].map(lambda n: n.split(':')[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

Is the same as,

chess_data.winner.str.split(':').str[0]
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

Which is also the same as,

pd.Series([x.split(':')[0] for x in chess_data['winner']], name='winner') 
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

回答3:

It is explained in the documentation under Indexing using str

.str[index] notation indexes the string by position where as [index] will slice based on the index of the series.

Using the example

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])

s.str[3]

returns the element at index 3 at each row

0    NaN
1    NaN
2    NaN
3      a
4      a
5    NaN
6      A
7    NaN
8    NaN

Whereas

s[3]

returns

'Aaba'

回答4:

Use apply method to extract first value from the splitted Series

chess_data.winner.str.split(':')
Out: 
0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

chess_data.winner.str.split(':').apply(lambda x: x[0])
Out:
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

When you use

chess_data.winner.str.split(":")[0]

you just get fist item from the resulting series. But .apply() applies some function, in this case, 'itemgetter', to all the values in the series and returns another series.

来源：https://stackoverflow.com/questions/51911933/using-str-in-split-in-pandas

标签

python

string

pandas

dataframe

split