Split strings in tuples into columns, in Pandas

前端 未结 3 2110
梦如初夏
梦如初夏 2020-12-03 02:43

I have the following DataFrame, where Track ID is the row index. How can I split the string in the stats column into 5 columns of numb

相关标签:
3条回答
  • 2020-12-03 03:15

    Assuming you have a column which contains tuples (as it appears in your example) rather than strings, this will work:

    df = pandas.DataFrame({'Track ID': [14, 28, 42], 'stats': [(1, 2, 3, 4, 5), (1, 2, 3, 4, 5), (1, 2, 3, 4, 5)]}).set_index("Track ID")
    
    from operator import itemgetter
    for i in range(5):
        df["Col {}".format(i)] = df.stats.apply(itemgetter(i))
    

    If you actually have strings that look like tuples, you can parse them first and then apply the same pattern as above:

    df = df2 = pandas.DataFrame({'Track ID': [14, 28, 42], 'stats': ["(1, 2, 3, 4, 5)", "(1, 2, 3, 4, 5)", "(1, 2, 3, 4, 5)"]}).set_index("Track ID")
    df.stats = df2.stats.str.strip("()").str.split(", ")
    
    0 讨论(0)
  • 2020-12-03 03:18

    If you have a sequence of tuples and not strings, and you want them as DataFrame columns, this is the simplest approach:

    df = pd.concat([df['Track ID'],pd.DataFrame(df['stats'].values.tolist())], axis=1)
    

    If it is actually strings, you can first convert it to lists like so, then apply the above operation:

    dfpart = pd.DataFrame(df['stats'].apply(lambda x: x.strip('()').split(', ')).values.tolist()).astype(float)
    df = pd.concat([df['Track ID'], dfpart], axis=1)
    
    0 讨论(0)
  • 2020-12-03 03:29

    And for the other case, assuming it are strings that look like tuples:

    In [74]: df['stats'].str[1:-1].str.split(',', expand=True).astype(float)
    Out[74]:
              0         1         2         3         4
    0 -0.009242  0.410000 -0.742016  0.003683  0.002517
    1  0.041154  0.318231  0.758717  0.002640  0.010654
    2 -0.014435  0.168438 -0.808703  0.000817  0.003166
    3  0.034346  0.288731  0.950845  0.000001  0.003373
    4  0.009052  0.151031  0.670257  0.012179  0.003022
    5 -0.004797  0.171615 -0.552879  0.050032  0.002180
    

    (note: for older versions of pandas (< 0.16.1), you need to use return_type='frame' instead of the expand keyword)

    By the way, if it are tuples and not strings, you can simply do the following:

    pd.DataFrame(df['stats'].tolist(), index=df.index)
    
    0 讨论(0)
提交回复
热议问题