Building a Transition Matrix using words in Python/Numpy

后端 未结 6 1761
抹茶落季
抹茶落季 2020-12-09 22:32

Im trying to build a 3x3 transition matrix with this data

days=[\'rain\', \'rain\', \'rain\', \'clouds\', \'rain\', \'sun\', \'clouds\', \'clouds\', 
  \'rai         


        
6条回答
  •  南笙
    南笙 (楼主)
    2020-12-09 23:11

    I like a combination of pandas and itertools for this. The code block is a bit longer than the above, but don't conflate verbosity with speed. (The window func should be very fast; the pandas portion will be slower admittedly.)

    First, make a "window" function. Here's one from the itertools cookbook. This gets you to a list of tuples of transitions (state1 to state2).

    from itertools import islice
    
    def window(seq, n=2):
        "Sliding window width n from seq.  From old itertools recipes."""
        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield result
        for elem in it:
            result = result[1:] + (elem,)
            yield result
    
    # list(window(days))
    # [('rain', 'rain'),
    #  ('rain', 'rain'),
    #  ('rain', 'clouds'),
    #  ('clouds', 'rain'),
    #  ('rain', 'sun'),
    # ...
    

    Then use a pandas groupby + value counts operation to get a transition matrix from each state1 to each state2:

    import pandas as pd
    
    pairs = pd.DataFrame(window(days), columns=['state1', 'state2'])
    counts = pairs.groupby('state1')['state2'].value_counts()
    probs = (counts / counts.sum()).unstack()
    

    Your result looks like this:

    print(probs)
    state2  clouds  rain   sun
    state1                    
    clouds    0.13  0.09  0.10
    rain      0.06  0.11  0.09
    sun       0.13  0.06  0.23
    

提交回复
热议问题