scikit-learn

ImportError: DLL load failed, with import sklearn

拥有回忆 提交于 2021-02-11 07:02:02
问题 I made a program using scikit-learn, it was working fine for months. Yesterday, when I run it again it showed me the error ImportError: DLL load failed: The specified module could not be found. I searched for an answer on StackOverflow and other websites. I also checked for the requirements. I am using PyCharm Community edition 2019.2.1, 64 bit. my libraries status:- joblib==0.14.0 numpy==1.17.2 pandas==0.25.1 scikit-learn==0.21.3 scipy==1.3.1 python==3.7.4, 32 bit Error in the line:- import

Binary Classification using the N-Grams

萝らか妹 提交于 2021-02-11 06:51:48
问题 I want to extract the ngrams of the tweets, from two groups of users (0/1), to make a CSV file as follows for a binary classifier. user_tweets, ngram1, ngram2, ngram3, ..., label 1, 0.0, 0.0, 0.0, ..., 0 2, 0.0, 0.0, 0.0, ..., 1 .. My question is whether I should first extract the important ngrams of the two groups, and then score each ngram that I found in the user's tweets? or is there an easier way to do this? 来源: https://stackoverflow.com/questions/66092089/binary-classification-using-the

The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24

冷暖自知 提交于 2021-02-11 05:33:44
问题 I'm using the DecisionTreeClassifier from scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) and getting the following warning: FutureWarning: The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API. I'm a bit confused about why I'm

The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24

£可爱£侵袭症+ 提交于 2021-02-11 05:29:49
问题 I'm using the DecisionTreeClassifier from scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) and getting the following warning: FutureWarning: The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API. I'm a bit confused about why I'm

SVM-OVO vs SVM-OVA in a very basic example

泄露秘密 提交于 2021-02-10 22:42:17
问题 Trying to understand how SVM-OVR (One-Vs-Rest) works, I was testing the following code: import matplotlib.pyplot as plt import numpy as np from sklearn.svm import SVC x = np.array([[1,1.1],[1,2],[2,1]]) y = np.array([0,100,250]) classifier = SVC(kernel='linear', decision_function_shape='ovr') classifier.fit(x,y) print(classifier.predict([[1,2]])) print(classifier.decision_function([[1,2]])) The outputs are: [100] [[ 1.05322128 2.1947332 -0.20488118]] It means that the sample [1,2] is

SVM-OVO vs SVM-OVA in a very basic example

♀尐吖头ヾ 提交于 2021-02-10 22:40:11
问题 Trying to understand how SVM-OVR (One-Vs-Rest) works, I was testing the following code: import matplotlib.pyplot as plt import numpy as np from sklearn.svm import SVC x = np.array([[1,1.1],[1,2],[2,1]]) y = np.array([0,100,250]) classifier = SVC(kernel='linear', decision_function_shape='ovr') classifier.fit(x,y) print(classifier.predict([[1,2]])) print(classifier.decision_function([[1,2]])) The outputs are: [100] [[ 1.05322128 2.1947332 -0.20488118]] It means that the sample [1,2] is

Fine Tuning hyperparameters doesn't improve score of classifiers

十年热恋 提交于 2021-02-10 18:30:31
问题 I am experiencing a problem where finetuning the hyperparameters using GridSearchCV doesn't really improve my classifiers. I figured the improvement should be bigger than that. The biggest improvement for a classifier I've gotten with my current code is around +-0.03. I have a dataset with eight columns and an unbalanced binary outcome. For scoring I use f1 and I use KFold with 10 splits. I was hoping if someone could spot something which is off and I should look at? Thank you! I use the

Shape error when using PolynomialFeatures

匆匆过客 提交于 2021-02-10 16:54:32
问题 The Issue To begin with I'm pretty new to machine learning. I have decided to test up some of the things that I have learned on some financial datam my machine learning model looks like this: import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures df = pd.read_csv("/Users/Documents/Trading.csv") poly_features = PolynomialFeatures(degree=2, include_bias=False) linear_reg = LinearRegression(fit_intercept = True) X = df_copy[[

Splitting training data with equal number rows for each classes

自古美人都是妖i 提交于 2021-02-10 15:51:29
问题 I have a very large dataset of about 314554097 rows and 3 columns. The third column is the class. The dataset has two class 0 and 1. I need split the data into test and training data. To split the data I can use from sklearn.cross_validation import train_test_split . X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.75, random_state = 0) But, The dataset contains about 99 percent of class 0 and only 1 percent of class 1. In the training dataset, I need an equal number of

Sklearn Gaussian Mixture lock parameters?

故事扮演 提交于 2021-02-10 15:49:21
问题 I'm trying to fit some gaussians, of which I already have a pretty good idea about the initial parameters (in this case, I'm generating the distributions, so I should always be able to fit these). However, I can't seem to figure out how to force the mean to be e.g. 0 for both gaussians. Is it possible? m.means_ = ... doesn't work. from sklearn import mixture import numpy as np import math import matplotlib.pyplot as plt from scipy import stats a = np.random.normal(0, 0.2, 500) b = np.random