data-science | 易学教程

scikit-learn - Convert pipeline prediction to original value/scale

阅读更多关于 scikit-learn - Convert pipeline prediction to original value/scale

问题 I've create a pipeline as follows (using the Keras Scikit-Learn API) estimators = [] estimators.append(('standardize', StandardScaler())) estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=50, batch_size=5, verbose=0))) pipeline = Pipeline(estimators) and fit it with pipeline.fit(trainX,trainY) If I predict with pipline.predict(testX) , I (believe) I get standardised predictions. How do I predict on testX so that predictedY it at the same scale as the actual (untouched

Retrieve final hidden activation layer output from sklearn's MLPClassifier

阅读更多关于 Retrieve final hidden activation layer output from sklearn's MLPClassifier

I would like to do some tests with neural network final hidden activation layer outputs using sklearn's MLPClassifier after fit ting the data. for example, If I create a classifier, assuming data X_train with labels y_train and two hidden layers of sizes (300,100) clf = MLPClassifier(hidden_layer_sizes=(300,100)) clf.fit(X_train,y_train) I would like to be able to call a function somehow to retrieve the final hidden activation layer vector of length 100 for use in additional tests. Assuming a test set X_test, y_test , normal prediction would be: preds = clf.predict(X_test) But, I would like to

How to get ROC curve for decision tree?

阅读更多关于 How to get ROC curve for decision tree?

问题 I am trying to find ROC curve and AUROC curve for decision tree. My code was something like clf.fit(x,y) y_score = clf.fit(x,y).decision_function(test[col]) pred = clf.predict_proba(test[col]) print(sklearn.metrics.roc_auc_score(actual,y_score)) fpr,tpr,thre = sklearn.metrics.roc_curve(actual,y_score) output: Error() 'DecisionTreeClassifier' object has no attribute 'decision_function' basically, the error is coming up while finding the y_score . Please explain what is y_score and how to solve

lightgbm ----ValueError: Circular reference detected

阅读更多关于 lightgbm ----ValueError: Circular reference detected

Train the model import lightgbm as lgb lgb_train = lgb.Dataset(x_train, y_train) lgb_val = lgb.Dataset(x_test, y_test) parameters = { 'application': 'binary', 'objective': 'binary', 'metric': 'auc', 'is_unbalance': 'true', 'boosting': 'gbdt', 'num_leaves': 31, 'feature_fraction': 0.5, 'bagging_fraction': 0.5, 'bagging_freq': 20, 'learning_rate': 0.05, 'verbose': 0 } model = lgb.train(parameters, train_data, valid_sets=test_data, num_boost_round=5000, early_stopping_rounds=100) y_pred = model.predict(test_data) I had what might be the same problem. Post the whole traceback to make sure. For me

Count number of counties per state using python {census}

阅读更多关于 Count number of counties per state using python {census}

问题 I am troubling with counting the number of counties using famous cenus.csv data. Task: Count number of counties in each state. Facing comparing (I think) / Please read below? I've tried this: df = pd.read_csv('census.csv') dfd = df[:]['STNAME'].unique() //Gives out names of state serr = pd.Series(dfd) // converting to series (from array) After this, i've tried using two approaches: 1: df[df['STNAME'] == serr] **//ERROR: series length must match** 2: i = 0 for name in serr: //This generate

Wor2vec fine tuning

阅读更多关于 Wor2vec fine tuning

问题 I am new at working with word2vec. I need to fine tune my word2vec model. I have 2 datasets: data1 and data2 what i did so far is : model = gensim.models.Word2Vec( data1, size=size_v, window=size_w, min_count=min_c, workers=work) model.train(data1, total_examples=len(data1), epochs=epochs) model.train(data2, total_examples=len(data2), epochs=epochs) Is this correct? Do I need to store learned weights somewhere? I checked this answer and this one but I couldn't understand how it's done. Can

Converting a pandas crosstab into a stacked dataframe (a regular table)

阅读更多关于 Converting a pandas crosstab into a stacked dataframe (a regular table)

Given a pandas crosstab, how do you convert that into a stacked dataframe? Assume you have a stacked dataframe. First we convert it into a crosstab. Now I would like to revert back to the original stacked dataframe. I searched a problem statement that addresses this requirement, but could not find any that hits bang on. In case I have missed any, please leave a note to it in the comment section. I would like to document the best practice here. So, thank you for your support. I know that pandas.DataFrame.stack() would be the best approach. But one needs to be careful of the the "level" stacking

How to calculate p-values in Spark's Logistic Regression?

阅读更多关于 How to calculate p-values in Spark's Logistic Regression?

We are using LogisticRegressionWithSGD and would like to figure out which of our variables predict and with what significance. Some stats packages (StatsModels) return p-values for each term. A low p-value (< 0.05) indicates a meaningful addition to the model. How can we get/calculate p-values from LogisticRegressionWithSGD model? Any help with this is appreciated. This is a very old question, but some guidance for people coming to it late might be valuable. LogisticRegressionWithSGD is deprecated . In that version, no true set of "summary" information was provided with the model itself. If

Python Pandas — Forward filling entire rows with value of one previous column

阅读更多关于 Python Pandas — Forward filling entire rows with value of one previous column

New to pandas development. How do I forward fill a DataFrame with the value contained in one previously seen column? Self-contained example: import pandas as pd import numpy as np O = [1, np.nan, 5, np.nan] H = [5, np.nan, 5, np.nan] L = [1, np.nan, 2, np.nan] C = [5, np.nan, 2, np.nan] timestamps = ["2017-07-23 03:13:00", "2017-07-23 03:14:00", "2017-07-23 03:15:00", "2017-07-23 03:16:00"] dict = {'Open': O, 'High': H, 'Low': L, 'Close': C} df = pd.DataFrame(index=timestamps, data=dict) ohlc = df[['Open', 'High', 'Low', 'Close']] This yields the following DataFrame: print(ohlc) Open High Low

scikit-learn - Convert pipeline prediction to original value/scale

阅读更多关于 scikit-learn - Convert pipeline prediction to original value/scale

I've create a pipeline as follows (using the Keras Scikit-Learn API ) estimators = [] estimators.append(('standardize', StandardScaler())) estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=50, batch_size=5, verbose=0))) pipeline = Pipeline(estimators) and fit it with pipeline.fit(trainX,trainY) If I predict with pipline.predict(testX) , I (believe) I get standardised predictions. How do I predict on testX so that predictedY it at the same scale as the actual (untouched) testY (i.e. NOT standardised prediction, but instead the actual values)? I see there is an inverse