sklearn-pandas

statsmodels raises TypeError: ufunc 'isfinite' not supported for the input types

本小妞迷上赌 提交于 2020-06-17 14:10:37
问题 I am applying backward elimination using statsmodels.api and the code gives this error `TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' I have no clue how to solve it here is the code import pandas as pd import matplotlib.pyplot as plt import numpy as np from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn

Sklearn_pandas in a pipeline returns TypeError: 'builtin_function_or_method' object is not iterable

时光毁灭记忆、已成空白 提交于 2020-05-15 05:10:36
问题 I have a data set with categorical and numerical features on which I want to apply some transformations followed by XGBClassifier. Link to data set : https://www.kaggle.com/blastchar/telco-customer-churn As the transformations are different for the numerical and categorical features, I used sklearn_pandas and its DataFrameMapper. To perform one-hot encoding on the categorical features, I want to use DictVectorizer. But to use DictVectorizer, I first need to convert the dataframe into a dict,

Sklearn_pandas in a pipeline returns TypeError: 'builtin_function_or_method' object is not iterable

谁说我不能喝 提交于 2020-05-15 05:08:10
问题 I have a data set with categorical and numerical features on which I want to apply some transformations followed by XGBClassifier. Link to data set : https://www.kaggle.com/blastchar/telco-customer-churn As the transformations are different for the numerical and categorical features, I used sklearn_pandas and its DataFrameMapper. To perform one-hot encoding on the categorical features, I want to use DictVectorizer. But to use DictVectorizer, I first need to convert the dataframe into a dict,

python sklearn multiple linear regression display r-squared

Deadly 提交于 2020-05-09 18:07:05
问题 I calculated my multiple linear regression equation and I want to see the adjusted R-squared. I know that the score function allows me to see r-squared, but it is not adjusted. import pandas as pd #import the pandas module import numpy as np df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',') df AverageNumberofTickets NumberofEmployees ValueofContract Industry 0 1 51 25750 Retail 1 9 68 25000 Services 2 20 67 40000 Services 3 1 124 35000 Retail 4 8 124 25000

python sklearn multiple linear regression display r-squared

浪子不回头ぞ 提交于 2020-05-09 18:05:59
问题 I calculated my multiple linear regression equation and I want to see the adjusted R-squared. I know that the score function allows me to see r-squared, but it is not adjusted. import pandas as pd #import the pandas module import numpy as np df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',') df AverageNumberofTickets NumberofEmployees ValueofContract Industry 0 1 51 25750 Retail 1 9 68 25000 Services 2 20 67 40000 Services 3 1 124 35000 Retail 4 8 124 25000

pd.get_dummies dataframe same size when Sparse = True as when Sparse = False

冷暖自知 提交于 2020-02-25 04:06:52
问题 I have a dataframe with several string columns that I want to convert to categorical data so that I can run some models and extract important features from. However, due to the amount of unique values, the one-hot encoded data expands into a large number of columns which is causing performance issues. To combat this, I'm experimenting with the Sparse = True parameter in get_dummies. test1 = pd.get_dummies(X.loc[:,['col1','col2','col3','col4']].head(10000)) test2 = pd.get_dummies(X.loc[:,[

pd.get_dummies dataframe same size when Sparse = True as when Sparse = False

Deadly 提交于 2020-02-25 04:06:12
问题 I have a dataframe with several string columns that I want to convert to categorical data so that I can run some models and extract important features from. However, due to the amount of unique values, the one-hot encoded data expands into a large number of columns which is causing performance issues. To combat this, I'm experimenting with the Sparse = True parameter in get_dummies. test1 = pd.get_dummies(X.loc[:,['col1','col2','col3','col4']].head(10000)) test2 = pd.get_dummies(X.loc[:,[

How to perform OneHotEncoding in Sklearn, getting value error

二次信任 提交于 2020-01-22 17:11:17
问题 I just started learning machine learning, when practicing one of the task, I am getting value error, but I followed the same steps as the instructor does. I am getting value error, please help. dff Country Name 0 AUS Sri 1 USA Vignesh 2 IND Pechi 3 USA Raj First I performed labelencoding, X=dff.values label_encoder=LabelEncoder() X[:,0]=label_encoder.fit_transform(X[:,0]) out: X array([[0, 'Sri'], [2, 'Vignesh'], [1, 'Pechi'], [2, 'Raj']], dtype=object) then performed One hot encoding for the

How to perform OneHotEncoding in Sklearn, getting value error

僤鯓⒐⒋嵵緔 提交于 2020-01-22 17:11:05
问题 I just started learning machine learning, when practicing one of the task, I am getting value error, but I followed the same steps as the instructor does. I am getting value error, please help. dff Country Name 0 AUS Sri 1 USA Vignesh 2 IND Pechi 3 USA Raj First I performed labelencoding, X=dff.values label_encoder=LabelEncoder() X[:,0]=label_encoder.fit_transform(X[:,0]) out: X array([[0, 'Sri'], [2, 'Vignesh'], [1, 'Pechi'], [2, 'Raj']], dtype=object) then performed One hot encoding for the

ValueError: could not convert string to float in panda

混江龙づ霸主 提交于 2020-01-11 11:35:31
问题 My code is : import pandas as pd data = pd.read_table('train.tsv') X=data.Phrase Y=data.Sentiment from sklearn import cross_validation X_train,X_test,Y_train,Y_test=cross_validation.train_test_split(X,Y,test_size=0.2,random_state=0) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(X,Y) I get the error : ValueError: could not convert string to float: What changes can I make that my code works? 回答1: You can't pass in text data into MultinomialNB of scikit-learn as