sklearn-pandas | 易学教程

Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found

阅读更多关于 Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found

问题 I tried to do the following importations for a machine learning project: from sklearn import preprocessing, cross_validation, svm from sklearn.linear_model import LinearRegression I got this error message: Traceback (most recent call last): File "C:/Users/Abdelhalim/PycharmProjects/ML/stock pricing.py", line 4, in <module> from sklearn import preprocessing, cross_validation, svm File "C:\Python27\lib\site-packages\sklearn\__init__.py", line 57, in <module> from .base import clone File "C:

Python Sklearn Linear Regression Value Error

阅读更多关于 Python Sklearn Linear Regression Value Error

问题 Ive been trying out Linear Regression using sklearn. Sometime I get a value error, sometimes it works fine. Im not sure which approach to use. Error Message is as follows: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 512, in fit y_numeric=True, multi_output=True) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages

Getting decision path to a node in sklearn

阅读更多关于 Getting decision path to a node in sklearn

I wanted the decision path (i.e the set of rules) from the root node to a given node (which I supply) in a decision tree (DecisionTreeClassifier) in scikit-learn. clf.decision_path specifies the nodes a sample goes through, which may help in getting the set of rules followed by the sample, but how do you get the set of rules up to a particular node in the tree? For the decision rules of the nodes using the iris dataset : from sklearn.datasets import load_iris from sklearn import tree import graphviz iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target)

Getting decision path to a node in sklearn

阅读更多关于 Getting decision path to a node in sklearn

问题 I wanted the decision path (i.e the set of rules) from the root node to a given node (which I supply) in a decision tree (DecisionTreeClassifier) in scikit-learn. clf.decision_path specifies the nodes a sample goes through, which may help in getting the set of rules followed by the sample, but how do you get the set of rules up to a particular node in the tree? 回答1: For the decision rules of the nodes using the iris dataset : from sklearn.datasets import load_iris from sklearn import tree

Append tfidf to pandas dataframe

阅读更多关于 Append tfidf to pandas dataframe

问题 I have the following pandas structure: col1 col2 col3 text 1 1 0 meaningful text 5 9 7 trees 7 8 2 text I'd like to vectorise it using a tfidf vectoriser. This, however, returns a parse matrix, which I can actually turn into a dense matrix via mysparsematrix).toarray() . However, how can I add this info with labels to my original df? So the target would look like: col1 col2 col3 meaningful text trees 1 1 0 1 1 0 5 9 7 0 0 1 7 8 2 0 1 0 UPDATE: Solution makes the concatenation wrong even when

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

阅读更多关于 ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

问题 I have applied Logistic Regression on train set after splitting the data set into test and train sets, but I got the above error. I tried to work it out, and when i tried to print my response vector y_train in the console it prints integer values like 0 or 1. But when i wrote it into a file I found the values were float numbers like 0.0 and 1.0. If thats the problem, how can I over come it. lenreg = LogisticRegression() print y_train[0:10] y_train.to_csv(path='ytard.csv') lenreg.fit(X_train,

use Featureunion in scikit-learn to combine two pandas columns for tfidf

阅读更多关于 use Featureunion in scikit-learn to combine two pandas columns for tfidf

While using this as a model for spam classification, I'd like to add an additional feature of the Subject plus the body. I have all of my features in a pandas dataframe. For example, the subject is df['Subject'], the body is df['body_text'] and the spam/ham label is df['ham/spam'] I receive the following error: TypeError: 'FeatureUnion' object is not iterable How can I use both df['Subject'] and df['body_text'] as features all while running them through the pipeline function? from sklearn.pipeline import FeatureUnion features = df[['Subject', 'body_text']].values combined_2 = FeatureUnion(list

How to normalize the Train and Test data using MinMaxScaler sklearn

阅读更多关于 How to normalize the Train and Test data using MinMaxScaler sklearn

问题 So, I have this doubt and have been looking for answers. So the question is when I use, from sklearn import preprocessing min_max_scaler = preprocessing.MinMaxScaler() df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']}) df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']]) df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1) After which I will train and test the model

use Featureunion in scikit-learn to combine two pandas columns for tfidf

阅读更多关于 use Featureunion in scikit-learn to combine two pandas columns for tfidf

问题 While using this as a model for spam classification, I'd like to add an additional feature of the Subject plus the body. I have all of my features in a pandas dataframe. For example, the subject is df['Subject'], the body is df['body_text'] and the spam/ham label is df['ham/spam'] I receive the following error: TypeError: 'FeatureUnion' object is not iterable How can I use both df['Subject'] and df['body_text'] as features all while running them through the pipeline function? from sklearn

How to normalize the Train and Test data using MinMaxScaler sklearn

阅读更多关于 How to normalize the Train and Test data using MinMaxScaler sklearn

So, I have this doubt and have been looking for answers. So the question is when I use, from sklearn import preprocessing min_max_scaler = preprocessing.MinMaxScaler() df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']}) df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']]) df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1) After which I will train and test the model ( A , B as features, C as Label) and get some accuracy score. Now my doubt is, what happens when I have