scikit-learn | 易学教程

ImportError: DLL load failed, with import sklearn

阅读更多关于 ImportError: DLL load failed, with import sklearn

问题 I made a program using scikit-learn, it was working fine for months. Yesterday, when I run it again it showed me the error ImportError: DLL load failed: The specified module could not be found. I searched for an answer on StackOverflow and other websites. I also checked for the requirements. I am using PyCharm Community edition 2019.2.1, 64 bit. my libraries status:- joblib==0.14.0 numpy==1.17.2 pandas==0.25.1 scikit-learn==0.21.3 scipy==1.3.1 python==3.7.4, 32 bit Error in the line:- import

Binary Classification using the N-Grams

阅读更多关于 Binary Classification using the N-Grams

问题 I want to extract the ngrams of the tweets, from two groups of users (0/1), to make a CSV file as follows for a binary classifier. user_tweets, ngram1, ngram2, ngram3, ..., label 1, 0.0, 0.0, 0.0, ..., 0 2, 0.0, 0.0, 0.0, ..., 1 .. My question is whether I should first extract the important ngrams of the two groups, and then score each ngram that I found in the user's tweets? or is there an easier way to do this? 来源： https://stackoverflow.com/questions/66092089/binary-classification-using-the

The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24

阅读更多关于 The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24

问题 I'm using the DecisionTreeClassifier from scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) and getting the following warning: FutureWarning: The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API. I'm a bit confused about why I'm

The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24

阅读更多关于 The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24

SVM-OVO vs SVM-OVA in a very basic example

阅读更多关于 SVM-OVO vs SVM-OVA in a very basic example

问题 Trying to understand how SVM-OVR (One-Vs-Rest) works, I was testing the following code: import matplotlib.pyplot as plt import numpy as np from sklearn.svm import SVC x = np.array([[1,1.1],[1,2],[2,1]]) y = np.array([0,100,250]) classifier = SVC(kernel='linear', decision_function_shape='ovr') classifier.fit(x,y) print(classifier.predict([[1,2]])) print(classifier.decision_function([[1,2]])) The outputs are: [100] [[ 1.05322128 2.1947332 -0.20488118]] It means that the sample [1,2] is

SVM-OVO vs SVM-OVA in a very basic example

阅读更多关于 SVM-OVO vs SVM-OVA in a very basic example

Fine Tuning hyperparameters doesn't improve score of classifiers

阅读更多关于 Fine Tuning hyperparameters doesn't improve score of classifiers

问题 I am experiencing a problem where finetuning the hyperparameters using GridSearchCV doesn't really improve my classifiers. I figured the improvement should be bigger than that. The biggest improvement for a classifier I've gotten with my current code is around +-0.03. I have a dataset with eight columns and an unbalanced binary outcome. For scoring I use f1 and I use KFold with 10 splits. I was hoping if someone could spot something which is off and I should look at? Thank you! I use the

Shape error when using PolynomialFeatures

阅读更多关于 Shape error when using PolynomialFeatures

问题 The Issue To begin with I'm pretty new to machine learning. I have decided to test up some of the things that I have learned on some financial datam my machine learning model looks like this: import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv("/Users/Documents/Trading.csv") poly_features = PolynomialFeatures(degree=2, include_bias=False) linear_reg = LinearRegression(fit_intercept = True) X = df_copy[[

Splitting training data with equal number rows for each classes

阅读更多关于 Splitting training data with equal number rows for each classes

问题 I have a very large dataset of about 314554097 rows and 3 columns. The third column is the class. The dataset has two class 0 and 1. I need split the data into test and training data. To split the data I can use from sklearn.cross_validation import train_test_split . X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.75, random_state = 0) But, The dataset contains about 99 percent of class 0 and only 1 percent of class 1. In the training dataset, I need an equal number of

Sklearn Gaussian Mixture lock parameters?

阅读更多关于 Sklearn Gaussian Mixture lock parameters?

问题 I'm trying to fit some gaussians, of which I already have a pretty good idea about the initial parameters (in this case, I'm generating the distributions, so I should always be able to fit these). However, I can't seem to figure out how to force the mean to be e.g. 0 for both gaussians. Is it possible? m.means_ = ... doesn't work. from sklearn import mixture import numpy as np import math import matplotlib.pyplot as plt from scipy import stats a = np.random.normal(0, 0.2, 500) b = np.random