Shuffling pandas data frame rows while avoiding consecutive condition values

此生再无相见时 提交于 2020-01-25 07:52:06

问题


I have a sample data frame read in using pandas. The data has two columns: 'item','label'. While I shuffle the df rows, I want to make sure the shuffled df does not have items that have the same consecutive labels. ie. this is acceptable, because the labels 'a','b', and 'c' are not in consecutive order:

1: fire, 'a'

2: smoke, 'b'

3: honey bee, 'a'

4: curtain, 'c'

but I want to avoid having such that the labels are in consecutive index, ie:

  1. fire, 'a'

  2. honey bee, 'a'

  3. smoke, 'b'

  4. curtain, 'c'

So far, I can shuffle using:

df = df.sample(frac=1).reset_index(drop=True)

I have a vague idea of looping over until df['label'][i+1] != df['label'][i], but not sure exactly how to. Any pointers or easier suggestion would be appreciated!


回答1:


Thanks for the comments/pointers. I got it to work by:

randomized = False
while not randomized:
    xlist = xlistbase.sample(frac=1).reset_index(drop=True) # where xlistbase is the original file read in
    # check for repeats
    for i in range(0, len(xlist)):
        try:
            if i == len(xlist) - 1:
                randomized = True
            elif xlist['label'][i] != xlist['label'][i+1]:
                continue
            elif xlist['label'][i] == xlist['label'][i+1]:
                break
        except IndexError:
            pass


来源:https://stackoverflow.com/questions/54450767/shuffling-pandas-data-frame-rows-while-avoiding-consecutive-condition-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!