Im trying to build a 3x3 transition matrix with this data
days=[\'rain\', \'rain\', \'rain\', \'clouds\', \'rain\', \'sun\', \'clouds\', \'clouds\',
\'rai
I like a combination of pandas and itertools for this. The code block is a bit longer than the above, but don't conflate verbosity with speed. (The window func should be very fast; the pandas portion will be slower admittedly.)
First, make a "window" function. Here's one from the itertools cookbook. This gets you to a list of tuples of transitions (state1 to state2).
from itertools import islice
def window(seq, n=2):
"Sliding window width n from seq. From old itertools recipes."""
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
# list(window(days))
# [('rain', 'rain'),
# ('rain', 'rain'),
# ('rain', 'clouds'),
# ('clouds', 'rain'),
# ('rain', 'sun'),
# ...
Then use a pandas groupby + value counts operation to get a transition matrix from each state1 to each state2:
import pandas as pd
pairs = pd.DataFrame(window(days), columns=['state1', 'state2'])
counts = pairs.groupby('state1')['state2'].value_counts()
probs = (counts / counts.sum()).unstack()
Your result looks like this:
print(probs)
state2 clouds rain sun
state1
clouds 0.13 0.09 0.10
rain 0.06 0.11 0.09
sun 0.13 0.06 0.23