data-science

scikit-learn - Convert pipeline prediction to original value/scale

蹲街弑〆低调 提交于 2019-12-07 05:49:31
问题 I've create a pipeline as follows (using the Keras Scikit-Learn API) estimators = [] estimators.append(('standardize', StandardScaler())) estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=50, batch_size=5, verbose=0))) pipeline = Pipeline(estimators) and fit it with pipeline.fit(trainX,trainY) If I predict with pipline.predict(testX) , I (believe) I get standardised predictions. How do I predict on testX so that predictedY it at the same scale as the actual (untouched

Retrieve final hidden activation layer output from sklearn's MLPClassifier

不羁的心 提交于 2019-12-06 12:02:59
I would like to do some tests with neural network final hidden activation layer outputs using sklearn's MLPClassifier after fit ting the data. for example, If I create a classifier, assuming data X_train with labels y_train and two hidden layers of sizes (300,100) clf = MLPClassifier(hidden_layer_sizes=(300,100)) clf.fit(X_train,y_train) I would like to be able to call a function somehow to retrieve the final hidden activation layer vector of length 100 for use in additional tests. Assuming a test set X_test, y_test , normal prediction would be: preds = clf.predict(X_test) But, I would like to

How to get ROC curve for decision tree?

为君一笑 提交于 2019-12-06 10:57:10
问题 I am trying to find ROC curve and AUROC curve for decision tree. My code was something like clf.fit(x,y) y_score = clf.fit(x,y).decision_function(test[col]) pred = clf.predict_proba(test[col]) print(sklearn.metrics.roc_auc_score(actual,y_score)) fpr,tpr,thre = sklearn.metrics.roc_curve(actual,y_score) output: Error() 'DecisionTreeClassifier' object has no attribute 'decision_function' basically, the error is coming up while finding the y_score . Please explain what is y_score and how to solve

lightgbm ----ValueError: Circular reference detected

眉间皱痕 提交于 2019-12-06 09:45:31
Train the model import lightgbm as lgb lgb_train = lgb.Dataset(x_train, y_train) lgb_val = lgb.Dataset(x_test, y_test) parameters = { 'application': 'binary', 'objective': 'binary', 'metric': 'auc', 'is_unbalance': 'true', 'boosting': 'gbdt', 'num_leaves': 31, 'feature_fraction': 0.5, 'bagging_fraction': 0.5, 'bagging_freq': 20, 'learning_rate': 0.05, 'verbose': 0 } model = lgb.train(parameters, train_data, valid_sets=test_data, num_boost_round=5000, early_stopping_rounds=100) y_pred = model.predict(test_data) I had what might be the same problem. Post the whole traceback to make sure. For me

Count number of counties per state using python {census}

大憨熊 提交于 2019-12-06 08:49:04
问题 I am troubling with counting the number of counties using famous cenus.csv data. Task: Count number of counties in each state. Facing comparing (I think) / Please read below? I've tried this: df = pd.read_csv('census.csv') dfd = df[:]['STNAME'].unique() //Gives out names of state serr = pd.Series(dfd) // converting to series (from array) After this, i've tried using two approaches: 1: df[df['STNAME'] == serr] **//ERROR: series length must match** 2: i = 0 for name in serr: //This generate

Wor2vec fine tuning

喜欢而已 提交于 2019-12-06 06:38:09
问题 I am new at working with word2vec. I need to fine tune my word2vec model. I have 2 datasets: data1 and data2 what i did so far is : model = gensim.models.Word2Vec( data1, size=size_v, window=size_w, min_count=min_c, workers=work) model.train(data1, total_examples=len(data1), epochs=epochs) model.train(data2, total_examples=len(data2), epochs=epochs) Is this correct? Do I need to store learned weights somewhere? I checked this answer and this one but I couldn't understand how it's done. Can

Converting a pandas crosstab into a stacked dataframe (a regular table)

假如想象 提交于 2019-12-06 04:31:52
Given a pandas crosstab, how do you convert that into a stacked dataframe? Assume you have a stacked dataframe. First we convert it into a crosstab. Now I would like to revert back to the original stacked dataframe. I searched a problem statement that addresses this requirement, but could not find any that hits bang on. In case I have missed any, please leave a note to it in the comment section. I would like to document the best practice here. So, thank you for your support. I know that pandas.DataFrame.stack() would be the best approach. But one needs to be careful of the the "level" stacking

How to calculate p-values in Spark's Logistic Regression?

可紊 提交于 2019-12-05 20:07:15
We are using LogisticRegressionWithSGD and would like to figure out which of our variables predict and with what significance. Some stats packages (StatsModels) return p-values for each term. A low p-value (< 0.05) indicates a meaningful addition to the model. How can we get/calculate p-values from LogisticRegressionWithSGD model? Any help with this is appreciated. This is a very old question, but some guidance for people coming to it late might be valuable. LogisticRegressionWithSGD is deprecated . In that version, no true set of "summary" information was provided with the model itself. If

Python Pandas — Forward filling entire rows with value of one previous column

扶醉桌前 提交于 2019-12-05 16:03:14
New to pandas development. How do I forward fill a DataFrame with the value contained in one previously seen column? Self-contained example: import pandas as pd import numpy as np O = [1, np.nan, 5, np.nan] H = [5, np.nan, 5, np.nan] L = [1, np.nan, 2, np.nan] C = [5, np.nan, 2, np.nan] timestamps = ["2017-07-23 03:13:00", "2017-07-23 03:14:00", "2017-07-23 03:15:00", "2017-07-23 03:16:00"] dict = {'Open': O, 'High': H, 'Low': L, 'Close': C} df = pd.DataFrame(index=timestamps, data=dict) ohlc = df[['Open', 'High', 'Low', 'Close']] This yields the following DataFrame: print(ohlc) Open High Low

scikit-learn - Convert pipeline prediction to original value/scale

江枫思渺然 提交于 2019-12-05 11:31:23
I've create a pipeline as follows (using the Keras Scikit-Learn API ) estimators = [] estimators.append(('standardize', StandardScaler())) estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=50, batch_size=5, verbose=0))) pipeline = Pipeline(estimators) and fit it with pipeline.fit(trainX,trainY) If I predict with pipline.predict(testX) , I (believe) I get standardised predictions. How do I predict on testX so that predictedY it at the same scale as the actual (untouched) testY (i.e. NOT standardised prediction, but instead the actual values)? I see there is an inverse