regression | 易学教程

The output of my regression NN with LSTMs is wrong even with low val_loss

阅读更多关于 The output of my regression NN with LSTMs is wrong even with low val_loss

问题 The bounty expires in 5 days . Answers to this question are eligible for a +50 reputation bounty. Sharan Duggirala wants to draw more attention to this question. The Model I am currently working on a stack of LSTMs and trying to solve a regression problem. The architecture of the model is as below: comp_lstm = tf.keras.models.Sequential([ tf.keras.layers.LSTM(64, return_sequences = True), tf.keras.layers.LSTM(64, return_sequences = True), tf.keras.layers.LSTM(64), tf.keras.layers.Dense(units

applying a user defined function to a dataframe

阅读更多关于 applying a user defined function to a dataframe

问题 The function im trying to write would take the dataframe provided and calculate the F statistic values and provide those as the output. Data Format Final Key Color Strength Fabric Sales a 0 1 1 10 b 1 2 2 15 Here Color, strength and Fabric are independent while Sale is dependent. The idea is to create a loop that creates a new dataframe for every unique key value: and perform a function over this dataframe and then create a new dataframe that is a concat of all the new dataframes obtained

fill missing values (nan) by regression of other columns

阅读更多关于 fill missing values (nan) by regression of other columns

问题 I've got a dataset containing a lot of missing values (NAN). I want to use linear or multilinear regression in python and fill all the missing values. You can find the dataset here: Dataset I have used f_regression(X_train, Y_train) to select which feature should I use. first of all I convert df['country'] to dummy then used important features then I have used regression but the results Not good. I have defined following functions to select features and missing values: def select_features

fill missing values (nan) by regression of other columns

阅读更多关于 fill missing values (nan) by regression of other columns

eli5: show_weights() with two labels

阅读更多关于 eli5: show_weights() with two labels

问题 I'm trying eli5 in order to understand the contribution of terms to the prediction of certain classes. You can run this script: import numpy as np from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.datasets import fetch_20newsgroups #categories = ['alt.atheism', 'soc.religion.christian'] categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics'] np.random.seed(1) train

Weighted logistic regression in Python

阅读更多关于 Weighted logistic regression in Python

问题 I'm looking for a good implementation for logistic regression (not regularized) in Python. I'm looking for a package that can also get weights for each vector. Can anyone suggest a good implementation / package? Thanks! 回答1: I notice that this question is quite old now but hopefully this can help someone. With sklearn, you can use the SGDClassifier class to create a logistic regression model by simply passing in 'log' as the loss: sklearn.linear_model.SGDClassifier(loss='log', ...). This

package emmeans in R not returning effect sizes

阅读更多关于 package emmeans in R not returning effect sizes

问题 I'm following this tutorial as well as ?eff_size from package emmeans to compute eff_size() for my regression model below. But I get the error: need an object with call component from the eff_size() call. Am I missing something? library(lme4) library(emmeans) h <- read.csv('https://raw.githubusercontent.com/hkil/m/master/h.csv') h$year <- as.factor(h$year) m <- lmer(scale~year*group + (1|stid), data = h) ems <- emmeans(m, pairwise ~ group*year, infer = c(T, T)) eff_size(ems, sigma = sigma(m),

Why is bam from mgcv slow for some data?

阅读更多关于 Why is bam from mgcv slow for some data?

问题 I am fitting the same Generalized Additive Model on multiple data sets using the bam function from mgcv . While for most of my data sets the fit completes within a reasonable time between 10 and 20 minutes. For a few data sets the run take more than 10 hours to complete. I cannot find any similarities between the slow cases, the final fit is neither exceptionally good nor bad, nor do they contain any noticeable outliers. How can I figure out why the fit is so slow for these instances? And how

Why is bam from mgcv slow for some data?

阅读更多关于 Why is bam from mgcv slow for some data?

How to calculated the adjusted R2 value using scikit

阅读更多关于 How to calculated the adjusted R2 value using scikit

问题 I have a dataset for which I have to develop various models and compute the adjusted R2 value of all models. cv = KFold(n_splits=5,shuffle=True,random_state=45) r2 = make_scorer(r2_score) r2_val_score = cross_val_score(clf, x, y, cv=cv,scoring=r2) scores=[r2_val_score.mean()] return scores I have used the above code to calculate the R2 value of every model. But I am more interested to know the adjusted R2 value of every models Is there any package in python which can do the job? I will