scikit-learn | 易学教程

partially define initial centroid for scikit-learn K-Means clustering

阅读更多关于 partially define initial centroid for scikit-learn K-Means clustering

问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th

partially define initial centroid for scikit-learn K-Means clustering

阅读更多关于 partially define initial centroid for scikit-learn K-Means clustering

GMM/EM on time series cluster

阅读更多关于 GMM/EM on time series cluster

问题 According to a paper, it is supposed to work. But as a learner of scikit-learn package.. I do not see how. All the sample codes cluster by ellipses or circles as here. I would really like to know how to cluster the following plot by different patterns... 0 -3 are the mean of power over certain time periods (divided into 4) while 4, 5, 6 each correspond to standard deviation of the year, variance in weekday/weekend, variance in winter/summer. So the ylabel does not necessarily meet with 4,5,6.

How to implement KNN to impute categorical features in a sklearn pipeline

阅读更多关于 How to implement KNN to impute categorical features in a sklearn pipeline

问题 I want to use KNN for imputing categorical features in a sklearn pipeline (muliple Categorical features missing). I have done quite a bit research on existing KNN solution (fancyimpute, sklearn KneighborRegressor). None of them seem to be working in terms work in a sklearn pipeline impute categorical features Some of my questions are (any advice is highly appreciated): is there any existing approach to allow using KNN (or any other regressor) to impute missing values (categorical in this case

memory error when todense in python using CountVectorizer

阅读更多关于 memory error when todense in python using CountVectorizer

问题 Here is my code and memory error when call todense() , I am using GBDT model, and wondering if anyone have good ideas how to work around memory error? Thanks. for feature_colunm_name in feature_columns_to_use: X_train[feature_colunm_name] = CountVectorizer().fit_transform(X_train[feature_colunm_name]).todense() X_test[feature_colunm_name] = CountVectorizer().fit_transform(X_test[feature_colunm_name]).todense() y_train = y_train.astype('int') grd = GradientBoostingClassifier(n_estimators=n

memory error when todense in python using CountVectorizer

阅读更多关于 memory error when todense in python using CountVectorizer

can't import nearest neighbors in scikit-learn 0.16

阅读更多关于 can't import nearest neighbors in scikit-learn 0.16

问题 Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. import sklearn sklearn.__version__ '0.16.1' from sklearn.neighbors import NearestNeighbors Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/sklearn/neighbors/__init__.py", line 9, in <module> from .graph

Why is KNN slow with custom metric?

阅读更多关于 Why is KNN slow with custom metric?

问题 I work with data set consists about 200k objects. Every object has 4 features. I classifies them by K nearest neighbors (KNN) with euclidean metric. Process is finished during about 20 seconds. Lately I've got a reason to use custom metric. Probably it will make better results. I've implemented custom metric and KNN has become to work more than one hour. I didn't wait for finishing of it. I assumed that a reason of this issue is my metric. I replace my code by return 1 . KNN still worked more

Why is KNN slow with custom metric?

阅读更多关于 Why is KNN slow with custom metric?

Why is KNN slow with custom metric?

阅读更多关于 Why is KNN slow with custom metric?