问题
I have a matrix with the grades from a class for different years(rows for years and columns for grades). What I want is to build a transition matrix with the change between years.
For instance, I want year t-1 on the y-axis and year t on the x-axis and then I want a transition matrix with the difference in the number of people with grade A between year t-1 and t, grade B between year t-1 and t, and so on. And then a second transition matrix with the proportions, for example: - Between year t-1 and t there z% more/less people with grade A/B/C/D/F.
Obviously the moest import part is the diagonal which would represent the change for the same grade for different years.
I want this to be done in Python.
Thank you very much, I hope everything is clear.
Result example: enter image description here
回答1:
You can use pandas library with df.diff. numpy can generate the matrix of all possible differences using np.subtract.outer. below is an example.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
years = ['2015', '2016', '2017']
grades = ['A', 'B', 'C', 'D']
df = pd.DataFrame(np.random.randint(0, 10, (3, 4)), columns=grades, index=years)
print(df)
A B C D
2015 5 0 2 0
2016 7 2 0 2
2017 3 7 6 7
df_diff = df.diff(axis=0)
print(df_diff)
each row here in df_diff is the difference between current row and the preceding one from original df
A B C D
2015 NaN NaN NaN NaN
2016 2.0 2.0 -2.0 2.0
2017 -4.0 5.0 6.0 5.0
a = np.array([])
differences = []
for i, y in enumerate(years):
for j, g in enumerate(grades):
differences.append(y+g)
a = np.append(a, df.iloc[i,j])
df3 = pd.DataFrame(np.subtract.outer(a, a), columns=differences, index=differences)
print(df3)
2015A 2015B 2015C 2015D 2016A 2016B 2016C 2016D 2017A 2017B 2017C 2017D
2015A 0.0 5.0 3.0 5.0 -2.0 3.0 5.0 3.0 2.0 -2.0 -1.0 -2.0
2015B -5.0 0.0 -2.0 0.0 -7.0 -2.0 0.0 -2.0 -3.0 -7.0 -6.0 -7.0
2015C -3.0 2.0 0.0 2.0 -5.0 0.0 2.0 0.0 -1.0 -5.0 -4.0 -5.0
2015D -5.0 0.0 -2.0 0.0 -7.0 -2.0 0.0 -2.0 -3.0 -7.0 -6.0 -7.0
2016A 2.0 7.0 5.0 7.0 0.0 5.0 7.0 5.0 4.0 0.0 1.0 0.0
2016B -3.0 2.0 0.0 2.0 -5.0 0.0 2.0 0.0 -1.0 -5.0 -4.0 -5.0
2016C -5.0 0.0 -2.0 0.0 -7.0 -2.0 0.0 -2.0 -3.0 -7.0 -6.0 -7.0
2016D -3.0 2.0 0.0 2.0 -5.0 0.0 2.0 0.0 -1.0 -5.0 -4.0 -5.0
2017A -2.0 3.0 1.0 3.0 -4.0 1.0 3.0 1.0 0.0 -4.0 -3.0 -4.0
2017B 2.0 7.0 5.0 7.0 0.0 5.0 7.0 5.0 4.0 0.0 1.0 0.0
2017C 1.0 6.0 4.0 6.0 -1.0 4.0 6.0 4.0 3.0 -1.0 0.0 -1.0
2017D 2.0 7.0 5.0 7.0 0.0 5.0 7.0 5.0 4.0 0.0 1.0 0.0
plot this matrix using matshow from matplotlib
plt.matshow(df3)
plt.colorbar()
plt.show()
来源:https://stackoverflow.com/questions/52682226/transition-matrix-for-counts-and-proportions-python