scikit-learn | 易学教程

Gensim LDA for text classification

阅读更多关于 Gensim LDA for text classification

问题 I post my question here because there are already some answers on how to use scikit methods with gensim like scikit vectorizers with gensim or this but I haven't seen the whole pipeline to be used for text classification. I will try to explain a little bit my situation I want to use gensim LDA implemented methods in order to proceed further to text classification. I have one dataset which is consisted from three parts(train(25K), test(25K) and unlabeled data(50K)). What I am trying to do is

How do you override Google AI platform's standard library's (i.e upgrade scikit-learn) and install other libraries for custom prediction routines?

阅读更多关于 How do you override Google AI platform's standard library's (i.e upgrade scikit-learn) and install other libraries for custom prediction routines?

问题 I'm currently building a pipeline and trying to see if I can get an ML model deployed in AI platform's prediction service, then use it later on in other projects via the HTTP request that the prediction service offers. However the model that is being used was built using an scikit-learn library that is of a higher version than offered for the prediction runtime version 1.15 (this is the current version supported by google for scikit-learn predictions). This runtime version only supports

sklearn dimensionality issues “Found array with dim 3. Estimator expected <= 2”

阅读更多关于 sklearn dimensionality issues “Found array with dim 3. Estimator expected

问题 I am trying to use KNN to correctly classify .wav files into two groups, group 0 and group 1. I extracted the data, created the model, fit the model, however when I try and use the .predict() method I get the following error: Traceback (most recent call last): File "/..../....../KNN.py", line 20, in <module> classifier.fit(X_train, y_train) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/neighbors/base.py", line 761, in fit X, y = check_X_y(X, y,

sklearn dimensionality issues “Found array with dim 3. Estimator expected <= 2”

阅读更多关于 sklearn dimensionality issues “Found array with dim 3. Estimator expected

Subsample size in scikit-learn RandomForestClassifier

阅读更多关于 Subsample size in scikit-learn RandomForestClassifier

问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Subsample size in scikit-learn RandomForestClassifier

阅读更多关于 Subsample size in scikit-learn RandomForestClassifier

Subsample size in scikit-learn RandomForestClassifier

阅读更多关于 Subsample size in scikit-learn RandomForestClassifier

Subsample size in scikit-learn RandomForestClassifier

阅读更多关于 Subsample size in scikit-learn RandomForestClassifier

How to apply a ScikitLearn classifier to tiles/windows in a large image

阅读更多关于 How to apply a ScikitLearn classifier to tiles/windows in a large image

问题 Given is a trained classifer in scikit learn, e.g. a RandomForestClassifier . The classifier has been trained on samples of size e.g. 25x25. How can I easily apply this to all tiles/windows in a large image (e.g. 640x480)? What I could do is (slow code ahead!) x_train = np.arange(25*25*1000).reshape(25,25,1000) # just some pseudo training data y_train = np.arange(1000) # just some pseudo training labels clf = RandomForestClassifier() clf.train( ... ) #train the classifier img = np.arange(640

partially define initial centroid for scikit-learn K-Means clustering

阅读更多关于 partially define initial centroid for scikit-learn K-Means clustering

问题 Scikit documentation states that: Method for initialization: ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. My data has 10 (predicted) clusters and 7 features. However, I would like to pass array of 10 by 6 shape, i.e. I want 6 dimensions of centroid of be predefined by me, but 7th