Python PCA plot using Hotelling's T2 for a confidence interval

匿名 (未验证) 提交于 2019-12-03 01:34:02

问题:

I am trying to apply PCA for Multi variant Analysis and plot the score plot for first two components with Hotelling T2 confidence ellipse in python. I was able to get the scatter plot and I want to add 95% confidence ellipse to the scatter plot. It would be great if anyone know how it can be done in python.

Sample picture of expected output:

回答1:

This was bugging me, so I adopted an answer from PCA and Hotelling's T^2 for confidence intervall in R in python (and using some source code from the ggbiplot R package)

from sklearn import decomposition from sklearn.preprocessing import StandardScaler import numpy as np import matplotlib.pyplot as plt import scipy, random  #Generate data and fit PCA random.seed(1) data = np.array(np.random.normal(0, 1, 500)).reshape(100, 5) outliers = np.array(np.random.uniform(5, 10, 25)).reshape(5, 5) data = np.vstack((data, outliers)) pca = decomposition.PCA(n_components = 2) scaler = StandardScaler() scaler.fit(data) data = scaler.transform(data) pcaFit = pca.fit(data) dataProject = pcaFit.transform(data)  #Calculate ellipse bounds and plot with scores theta = np.concatenate((np.linspace(-np.pi, np.pi, 50), np.linspace(np.pi, -np.pi, 50))) circle = np.array((np.cos(theta), np.sin(theta))) sigma = np.cov(np.array((dataProject[:, 0], dataProject[:, 1]))) ed = np.sqrt(scipy.stats.chi2.ppf(0.95, 2)) ell = np.transpose(circle).dot(np.linalg.cholesky(sigma) * ed) a, b = np.max(ell[: ,0]), np.max(ell[: ,1]) #95% ellipse bounds t = np.linspace(0, 2 * np.pi, 100)  plt.scatter(dataProject[:, 0], dataProject[:, 1]) plt.plot(a * np.cos(t), b * np.sin(t), color = 'red') plt.grid(color = 'lightgray', linestyle = '--') plt.show() 

Plot



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!