最近在用python做数据挖掘,在聚类的时候遇到了一个非常恶心的问题,搜遍全网都没有解决方案。话不多说,直接上代码:
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
#kmeans算法
df1=df23
kmeans = KMeans(n_clusters=5, random_state=10).fit(df1)
#贴上每个样本对应的簇类别标签
df1['level']=kmeans.labels_
#df1.to_csv('new_df.csv')
df2=df1.groupby('level',as_index=False)['level'].agg({'num': np.size})
print(df2.head())
#将用于聚类的数据的特征的维度降至2维
pca = PCA(n_components=2)
new_pca = pd.DataFrame(pca.fit_transform(df1))
print(new_pca.head())
#可视化
d = new_pca[df1['level'] == 0]
plt.plot(d[0], d[1], 'gv')
d = new_pca[df1['level'] == 1]
plt.plot(d[0], d[1], 'ko')
d = new_pca[df1['level'] == 2]
plt.plot(d[0], d[1], 'b*')
d = new_pca[df1['level'] == 3]
plt.plot(d[0], d[1], 'y+')
d = new_pca[df1['level'] == 4]
plt.plot(d[0], d[1], 'c.')
plt.title('the result of polymerization')
plt.show()
错误如下:
关注我,更多你找不到解决方案的bug,统统告诉你:
网上找了好久都没找到解决方法,明明之前成功过的。于是我查看了df23数据,发现它是这样的:
与之前成功的dataframe的唯一差别就是索引!!!重要的事情说三遍!!!索引!!!索引!!!于是乎,我去找怎么重置索引的方法,见代码:
df24=df23[["forks_count","has_issues","has_wiki","open_issues_count","stargazers_count","watchers_count","created_pushed_time","created_updated_time"]]
df24=df24.reset_index()
df24=df24[["forks_count","has_issues","has_wiki","open_issues_count","stargazers_count","watchers_count","created_pushed_time","created_updated_time"]]
来源:CSDN
作者:yuanninesuns
链接:https://blog.csdn.net/yuanninesuns/article/details/103974744