Count appearances of a value until it changes to another value

前端 未结 6 1084
予麋鹿
予麋鹿 2020-12-17 21:17

I have the following DataFrame:

df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=[\'values\'])

I want to calculate

相关标签:
6条回答
  • 2020-12-17 21:29

    You can keep track of where the changes in df['values'] occur, and groupby the changes and also df['values'] (to keep them as index) computing the size of each group

    changes = df['values'].diff().ne(0).cumsum()
    df.groupby([changes,'values']).size().reset_index(level=0, drop=True)
    
     values
    10    2
    23    2
    9     3
    10    4
    12    1
    dtype: int64
    
    0 讨论(0)
  • 2020-12-17 21:34

    This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.

    import pandas as pd
    
    df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
    
    dict_count = {}
    for v in df['values'].unique():
        dict_count[v] = 0
    
    curr_val = df.iloc[0]['values']
    count = 1
    for i in range(1, len(df)):
        if df.iloc[i]['values'] == curr_val:
            count += 1
        else:
            if count > dict_count[curr_val]:
                dict_count[curr_val] = count
            curr_val = df.iloc[i]['values']
            count = 1
    if count > dict_count[curr_val]:
        dict_count[curr_val] = count
    
    df_count = pd.DataFrame(dict_count, index=[0])
    print(df_count)
    
    0 讨论(0)
  • 2020-12-17 21:37

    Using crosstab

    df['key']=df['values'].diff().ne(0).cumsum()
    pd.crosstab(df['key'],df['values'])
    Out[353]: 
    values  9   10  12  23
    key                   
    1        0   2   0   0
    2        0   0   0   2
    3        3   0   0   0
    4        0   4   0   0
    5        0   0   1   0
    

    Slightly modify the result above

    pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
    Out[355]: 
    key  values
    1    10        2
    2    23        2
    3    9         3
    4    10        4
    5    12        1
    dtype: int64
    

    Base on python groupby

    from itertools import groupby
    
    [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
    Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
    
    0 讨论(0)
  • 2020-12-17 21:41

    itertools.groupby

    from itertools import groupby
    
    pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))
    
    10    2
    23    2
    9     3
    10    4
    12    1
    dtype: int64
    

    It's a generator

    def f(x):
      count = 1
      for this, that in zip(x, x[1:]):
        if this == that:
          count += 1
        else:
          yield count, this
          count = 1
      yield count, [*x][-1]
    
    pd.Series(*zip(*f(df['values'])))
    
    10    2
    23    2
    9     3
    10    4
    12    1
    dtype: int64
    
    0 讨论(0)
  • 2020-12-17 21:48

    Use:

    df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()
    

    Or:

    df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()
    

    print (df)
    values  values
    1       10        2
    2       23        2
    3       9         3
    4       10        4
    5       12        1
    Name: values, dtype: int64
    

    Last for remove first level:

    df = df.reset_index(level=0, drop=True)
    print (df)
    values
    10    2
    23    2
    9     3
    10    4
    12    1
    dtype: int64
    

    Explanation:

    Compare original column by shifted with not equal ne and then add cumsum for helper Series:

    print (pd.concat([df['values'], a, b, c], 
                     keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
        orig  shifted  not_equal  cumsum
    0     10      NaN       True       1
    1     10     10.0      False       1
    2     23     10.0       True       2
    3     23     23.0      False       2
    4      9     23.0       True       3
    5      9      9.0      False       3
    6      9      9.0      False       3
    7     10      9.0       True       4
    8     10     10.0      False       4
    9     10     10.0      False       4
    10    10     10.0      False       4
    11    12     10.0       True       5
    
    0 讨论(0)
  • 2020-12-17 21:48

    The function groupby in itertools can help you, for str:

    >>> string = 'aabbaacc'
    >>> for char, freq in groupby('aabbaacc'):
    >>>     print(char, len(list(freq)), sep=':', end='\n')
    [out]:
        a:2
        b:2
        a:2
        c:2
    

    This function also works for list:

    >>> df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])
    >>> for char, freq in groupby(df['values'].tolist()):
    >>>     print(char, len(list(freq)), sep=':', end='\n')
    [out]:
        10:2
        23:2
         9:3
        10:4
        12:1
    

    Note: for df, you always use this way like df['values'] to take 'values' column, because DataFrame have a attribute values

    0 讨论(0)
提交回复
热议问题