Pandas Split Function in Reverse

后端 未结 3 785
遥遥无期
遥遥无期 2021-01-15 08:46

I have a Pandas Dataframe with a column that looks like this:

    Car_Make
0   2017 Abarth 124 Spider ManualConvertible
1   2017 Abarth 124 Spider AutoConver         


        
3条回答
  •  甜味超标
    2021-01-15 09:28

    The code you're asking here:

    df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1]))
    

    There are several things going on here:

    1.) First, lambda are basically impromptu functions. In this case, it's an unnamed function taking the argument x, and returns pd.Series(x.split()[::-1]. More on x later.

    2.) pd.Series(...) as you know creates a pandas Series object much like your original data.

    3.) x.split() is splitting the string x with space as a separator by default.

    4.) The [::-1] bit is a slice.. Much like range(), it takes 3 params, [start: end: steps]. In this case, it's saying to get the string from start to end, but use -1 as steps, i.e. in reverse. Note that only the end param is mandatory.

    5.) The main function here is apply() on your df['Car_Make'] series, which is essentially a list of strings. apply() takes a function (much like map()) and apply it to the df['Car_Make'] series. In this case, it's applying the lambda, which takes the data of your series and use it as argument x for the function.

    6.) Putting everything back together. The statement is:

    • passing the df['Car_Make'] string data as x to the lambda
    • lambda then process the x.split() to split the string data into list.
    • The list is then sorted in reverse order by the slice [::-1].
    • pd.Series() now convert the list into a Series object.
    • The Series object is then returned by lambda to your apply() function.
    • The apply() function then return the resulting Series object, which conveniently, is the reverse sorted string you wanted in a Series.

    If all you care about is the very last split though, you really don't need to do the reverse split and all that. You could easily have done the following and it would have returned the very last item in the split right away:

    data['Car Make'].apply(lambda x: pd.Series({'Car_Make':x.split()[-1]}))

                Car_Make
    0  ManualConvertible
    1    AutoConvertible
    2  ManualConvertible
    3    AutoConvertible
    4        ManualHatch
    5          AutoHatch
    

    Thank you for asking this question, I learned a few stuff about pandas during this answer as well.

提交回复
热议问题