scikit-learn

partially define initial centroid for scikit-learn K-Means clustering

偶尔善良 提交于 2021-02-08 10:56:40
问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th

partially define initial centroid for scikit-learn K-Means clustering

谁都会走 提交于 2021-02-08 10:56:35
问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th

GMM/EM on time series cluster

你。 提交于 2021-02-08 10:07:40
问题 According to a paper, it is supposed to work. But as a learner of scikit-learn package.. I do not see how. All the sample codes cluster by ellipses or circles as here. I would really like to know how to cluster the following plot by different patterns... 0 -3 are the mean of power over certain time periods (divided into 4) while 4, 5, 6 each correspond to standard deviation of the year, variance in weekday/weekend, variance in winter/summer. So the ylabel does not necessarily meet with 4,5,6.

How to implement KNN to impute categorical features in a sklearn pipeline

谁说胖子不能爱 提交于 2021-02-08 09:43:09
问题 I want to use KNN for imputing categorical features in a sklearn pipeline (muliple Categorical features missing). I have done quite a bit research on existing KNN solution (fancyimpute, sklearn KneighborRegressor). None of them seem to be working in terms work in a sklearn pipeline impute categorical features Some of my questions are (any advice is highly appreciated): is there any existing approach to allow using KNN (or any other regressor) to impute missing values (categorical in this case

memory error when todense in python using CountVectorizer

非 Y 不嫁゛ 提交于 2021-02-08 09:11:38
问题 Here is my code and memory error when call todense() , I am using GBDT model, and wondering if anyone have good ideas how to work around memory error? Thanks. for feature_colunm_name in feature_columns_to_use: X_train[feature_colunm_name] = CountVectorizer().fit_transform(X_train[feature_colunm_name]).todense() X_test[feature_colunm_name] = CountVectorizer().fit_transform(X_test[feature_colunm_name]).todense() y_train = y_train.astype('int') grd = GradientBoostingClassifier(n_estimators=n

memory error when todense in python using CountVectorizer

若如初见. 提交于 2021-02-08 09:11:14
问题 Here is my code and memory error when call todense() , I am using GBDT model, and wondering if anyone have good ideas how to work around memory error? Thanks. for feature_colunm_name in feature_columns_to_use: X_train[feature_colunm_name] = CountVectorizer().fit_transform(X_train[feature_colunm_name]).todense() X_test[feature_colunm_name] = CountVectorizer().fit_transform(X_test[feature_colunm_name]).todense() y_train = y_train.astype('int') grd = GradientBoostingClassifier(n_estimators=n

can't import nearest neighbors in scikit-learn 0.16

允我心安 提交于 2021-02-08 08:38:16
问题 Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. import sklearn sklearn.__version__ '0.16.1' from sklearn.neighbors import NearestNeighbors Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/sklearn/neighbors/__init__.py", line 9, in <module> from .graph

Why is KNN slow with custom metric?

蓝咒 提交于 2021-02-08 08:22:27
问题 I work with data set consists about 200k objects. Every object has 4 features. I classifies them by K nearest neighbors (KNN) with euclidean metric. Process is finished during about 20 seconds. Lately I've got a reason to use custom metric. Probably it will make better results. I've implemented custom metric and KNN has become to work more than one hour. I didn't wait for finishing of it. I assumed that a reason of this issue is my metric. I replace my code by return 1 . KNN still worked more

Why is KNN slow with custom metric?

ⅰ亾dé卋堺 提交于 2021-02-08 08:20:27
问题 I work with data set consists about 200k objects. Every object has 4 features. I classifies them by K nearest neighbors (KNN) with euclidean metric. Process is finished during about 20 seconds. Lately I've got a reason to use custom metric. Probably it will make better results. I've implemented custom metric and KNN has become to work more than one hour. I didn't wait for finishing of it. I assumed that a reason of this issue is my metric. I replace my code by return 1 . KNN still worked more

Why is KNN slow with custom metric?

£可爱£侵袭症+ 提交于 2021-02-08 08:20:01
问题 I work with data set consists about 200k objects. Every object has 4 features. I classifies them by K nearest neighbors (KNN) with euclidean metric. Process is finished during about 20 seconds. Lately I've got a reason to use custom metric. Probably it will make better results. I've implemented custom metric and KNN has become to work more than one hour. I didn't wait for finishing of it. I assumed that a reason of this issue is my metric. I replace my code by return 1 . KNN still worked more