问题
Here is some dummy data I have created for my question. I have two questions regarding this:
- Why is
split
working by usingstr
in the first part of the query and not in the second part? - How come
[0]
is picking up the first row in part 1 and the first element from each row in part 2?
chess_data = pd.DataFrame({"winner": ['A:1','A:2','A:3','A:4','B:1','B:2']})
chess_data.winner.str.split(":")[0]
['A', '1']
chess_data.winner.map(lambda n: n.split(":")[0])
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
回答1:
chess_data
is a dataframechess_data.winner
is a serieschess_data.winner.str
is an accessor to methods that are string specific and optimized (to a degree)chess_data.winner.str.split
is one such methodchess_data.winner.map
is a different method that takes a dictionary or a callable object and either calls that callable with each element in the series or calls the dictionariesget
method on each element of the series.
In the case of using chess_data.winner.str.split
Pandas does do a loop and performs a kind of str.split
. While map
is a more crude way of doing the same thing.
With your data.
chess_data.winner.str.split(':')
0 [A, 1]
1 [A, 2]
2 [A, 3]
3 [A, 4]
4 [B, 1]
5 [B, 2]
Name: winner, dtype: object
In order to get each first element, you'll want to use the string accessor again
chess_data.winner.str.split(':').str[0]
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
This is the equivalent way of performing what you had done in your map
chess_data.winner.map(lambda x: x.split(':')[0])
You could have also used a comprehension
chess_data.assign(new_col=[x.split(':')[0] for x in chess_data.winner])
winner new_col
0 A:1 A
1 A:2 A
2 A:3 A
3 A:4 A
4 B:1 B
5 B:2 B
回答2:
Your code,
chess_data['winner'].str.split(':')[0]
['A', '1']
Is the same as,
chess_data['winner'].str.split(':').loc[0]
['A', '1']
And,
chess_data['winner'].map(lambda n: n.split(':')[0])
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
Is the same as,
chess_data.winner.str.split(':').str[0]
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
Which is also the same as,
pd.Series([x.split(':')[0] for x in chess_data['winner']], name='winner')
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
回答3:
It is explained in the documentation under Indexing using str
.str[index] notation indexes the string by position where as [index] will slice based on the index of the series.
Using the example
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])
s.str[3]
returns the element at index 3 at each row
0 NaN
1 NaN
2 NaN
3 a
4 a
5 NaN
6 A
7 NaN
8 NaN
Whereas
s[3]
returns
'Aaba'
回答4:
Use apply method to extract first value from the splitted Series
chess_data.winner.str.split(':')
Out:
0 [A, 1]
1 [A, 2]
2 [A, 3]
3 [A, 4]
4 [B, 1]
5 [B, 2]
Name: winner, dtype: object
chess_data.winner.str.split(':').apply(lambda x: x[0])
Out:
0 A
1 A
2 A
3 A
4 B
5 B
Name: winner, dtype: object
When you use
chess_data.winner.str.split(":")[0]
you just get fist item from the resulting series. But .apply() applies some function, in this case, 'itemgetter', to all the values in the series and returns another series.
来源:https://stackoverflow.com/questions/51911933/using-str-in-split-in-pandas