sklearn-pandas

Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found

我只是一个虾纸丫 提交于 2019-12-01 17:29:12
问题 I tried to do the following importations for a machine learning project: from sklearn import preprocessing, cross_validation, svm from sklearn.linear_model import LinearRegression I got this error message: Traceback (most recent call last): File "C:/Users/Abdelhalim/PycharmProjects/ML/stock pricing.py", line 4, in <module> from sklearn import preprocessing, cross_validation, svm File "C:\Python27\lib\site-packages\sklearn\__init__.py", line 57, in <module> from .base import clone File "C:

Python Sklearn Linear Regression Value Error

主宰稳场 提交于 2019-12-01 11:04:38
问题 Ive been trying out Linear Regression using sklearn. Sometime I get a value error, sometimes it works fine. Im not sure which approach to use. Error Message is as follows: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 512, in fit y_numeric=True, multi_output=True) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages

Getting decision path to a node in sklearn

巧了我就是萌 提交于 2019-12-01 07:12:59
I wanted the decision path (i.e the set of rules) from the root node to a given node (which I supply) in a decision tree (DecisionTreeClassifier) in scikit-learn. clf.decision_path specifies the nodes a sample goes through, which may help in getting the set of rules followed by the sample, but how do you get the set of rules up to a particular node in the tree? For the decision rules of the nodes using the iris dataset : from sklearn.datasets import load_iris from sklearn import tree import graphviz iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target)

Getting decision path to a node in sklearn

回眸只為那壹抹淺笑 提交于 2019-12-01 04:28:57
问题 I wanted the decision path (i.e the set of rules) from the root node to a given node (which I supply) in a decision tree (DecisionTreeClassifier) in scikit-learn. clf.decision_path specifies the nodes a sample goes through, which may help in getting the set of rules followed by the sample, but how do you get the set of rules up to a particular node in the tree? 回答1: For the decision rules of the nodes using the iris dataset : from sklearn.datasets import load_iris from sklearn import tree

Append tfidf to pandas dataframe

戏子无情 提交于 2019-12-01 03:47:11
问题 I have the following pandas structure: col1 col2 col3 text 1 1 0 meaningful text 5 9 7 trees 7 8 2 text I'd like to vectorise it using a tfidf vectoriser. This, however, returns a parse matrix, which I can actually turn into a dense matrix via mysparsematrix).toarray() . However, how can I add this info with labels to my original df? So the target would look like: col1 col2 col3 meaningful text trees 1 1 0 1 1 0 5 9 7 0 0 1 7 8 2 0 1 0 UPDATE: Solution makes the concatenation wrong even when

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

牧云@^-^@ 提交于 2019-12-01 02:35:05
问题 I have applied Logistic Regression on train set after splitting the data set into test and train sets, but I got the above error. I tried to work it out, and when i tried to print my response vector y_train in the console it prints integer values like 0 or 1. But when i wrote it into a file I found the values were float numbers like 0.0 and 1.0. If thats the problem, how can I over come it. lenreg = LogisticRegression() print y_train[0:10] y_train.to_csv(path='ytard.csv') lenreg.fit(X_train,

use Featureunion in scikit-learn to combine two pandas columns for tfidf

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-30 18:40:22
While using this as a model for spam classification, I'd like to add an additional feature of the Subject plus the body. I have all of my features in a pandas dataframe. For example, the subject is df['Subject'], the body is df['body_text'] and the spam/ham label is df['ham/spam'] I receive the following error: TypeError: 'FeatureUnion' object is not iterable How can I use both df['Subject'] and df['body_text'] as features all while running them through the pipeline function? from sklearn.pipeline import FeatureUnion features = df[['Subject', 'body_text']].values combined_2 = FeatureUnion(list

How to normalize the Train and Test data using MinMaxScaler sklearn

时光怂恿深爱的人放手 提交于 2019-11-30 11:12:11
问题 So, I have this doubt and have been looking for answers. So the question is when I use, from sklearn import preprocessing min_max_scaler = preprocessing.MinMaxScaler() df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']}) df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']]) df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1) After which I will train and test the model

use Featureunion in scikit-learn to combine two pandas columns for tfidf

試著忘記壹切 提交于 2019-11-30 02:51:53
问题 While using this as a model for spam classification, I'd like to add an additional feature of the Subject plus the body. I have all of my features in a pandas dataframe. For example, the subject is df['Subject'], the body is df['body_text'] and the spam/ham label is df['ham/spam'] I receive the following error: TypeError: 'FeatureUnion' object is not iterable How can I use both df['Subject'] and df['body_text'] as features all while running them through the pipeline function? from sklearn

How to normalize the Train and Test data using MinMaxScaler sklearn

牧云@^-^@ 提交于 2019-11-30 02:32:11
So, I have this doubt and have been looking for answers. So the question is when I use, from sklearn import preprocessing min_max_scaler = preprocessing.MinMaxScaler() df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']}) df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']]) df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1) After which I will train and test the model ( A , B as features, C as Label) and get some accuracy score. Now my doubt is, what happens when I have