scikit-learn

Gensim LDA for text classification

99封情书 提交于 2021-02-10 09:54:10
问题 I post my question here because there are already some answers on how to use scikit methods with gensim like scikit vectorizers with gensim or this but I haven't seen the whole pipeline to be used for text classification. I will try to explain a little bit my situation I want to use gensim LDA implemented methods in order to proceed further to text classification. I have one dataset which is consisted from three parts(train(25K), test(25K) and unlabeled data(50K)). What I am trying to do is

How do you override Google AI platform's standard library's (i.e upgrade scikit-learn) and install other libraries for custom prediction routines?

痴心易碎 提交于 2021-02-10 07:45:06
问题 I'm currently building a pipeline and trying to see if I can get an ML model deployed in AI platform's prediction service, then use it later on in other projects via the HTTP request that the prediction service offers. However the model that is being used was built using an scikit-learn library that is of a higher version than offered for the prediction runtime version 1.15 (this is the current version supported by google for scikit-learn predictions). This runtime version only supports

sklearn dimensionality issues “Found array with dim 3. Estimator expected <= 2”

倖福魔咒の 提交于 2021-02-09 11:13:22
问题 I am trying to use KNN to correctly classify .wav files into two groups, group 0 and group 1. I extracted the data, created the model, fit the model, however when I try and use the .predict() method I get the following error: Traceback (most recent call last): File "/..../....../KNN.py", line 20, in <module> classifier.fit(X_train, y_train) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/neighbors/base.py", line 761, in fit X, y = check_X_y(X, y,

sklearn dimensionality issues “Found array with dim 3. Estimator expected <= 2”

依然范特西╮ 提交于 2021-02-09 11:12:20
问题 I am trying to use KNN to correctly classify .wav files into two groups, group 0 and group 1. I extracted the data, created the model, fit the model, however when I try and use the .predict() method I get the following error: Traceback (most recent call last): File "/..../....../KNN.py", line 20, in <module> classifier.fit(X_train, y_train) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/neighbors/base.py", line 761, in fit X, y = check_X_y(X, y,

Subsample size in scikit-learn RandomForestClassifier

走远了吗. 提交于 2021-02-09 08:21:11
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Subsample size in scikit-learn RandomForestClassifier

三世轮回 提交于 2021-02-09 08:20:55
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Subsample size in scikit-learn RandomForestClassifier

半世苍凉 提交于 2021-02-09 08:19:04
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Subsample size in scikit-learn RandomForestClassifier

。_饼干妹妹 提交于 2021-02-09 08:19:03
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

How to apply a ScikitLearn classifier to tiles/windows in a large image

社会主义新天地 提交于 2021-02-08 20:53:36
问题 Given is a trained classifer in scikit learn, e.g. a RandomForestClassifier . The classifier has been trained on samples of size e.g. 25x25. How can I easily apply this to all tiles/windows in a large image (e.g. 640x480)? What I could do is (slow code ahead!) x_train = np.arange(25*25*1000).reshape(25,25,1000) # just some pseudo training data y_train = np.arange(1000) # just some pseudo training labels clf = RandomForestClassifier() clf.train( ... ) #train the classifier img = np.arange(640

partially define initial centroid for scikit-learn K-Means clustering

孤街醉人 提交于 2021-02-08 10:57:22
问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th