regression

The output of my regression NN with LSTMs is wrong even with low val_loss

烂漫一生 提交于 2020-06-17 09:41:00
问题 The bounty expires in 5 days . Answers to this question are eligible for a +50 reputation bounty. Sharan Duggirala wants to draw more attention to this question. The Model I am currently working on a stack of LSTMs and trying to solve a regression problem. The architecture of the model is as below: comp_lstm = tf.keras.models.Sequential([ tf.keras.layers.LSTM(64, return_sequences = True), tf.keras.layers.LSTM(64, return_sequences = True), tf.keras.layers.LSTM(64), tf.keras.layers.Dense(units

applying a user defined function to a dataframe

情到浓时终转凉″ 提交于 2020-06-17 09:35:51
问题 The function im trying to write would take the dataframe provided and calculate the F statistic values and provide those as the output. Data Format Final Key Color Strength Fabric Sales a 0 1 1 10 b 1 2 2 15 Here Color, strength and Fabric are independent while Sale is dependent. The idea is to create a loop that creates a new dataframe for every unique key value: and perform a function over this dataframe and then create a new dataframe that is a concat of all the new dataframes obtained

fill missing values (nan) by regression of other columns

青春壹個敷衍的年華 提交于 2020-06-17 05:29:03
问题 I've got a dataset containing a lot of missing values (NAN). I want to use linear or multilinear regression in python and fill all the missing values. You can find the dataset here: Dataset I have used f_regression(X_train, Y_train) to select which feature should I use. first of all I convert df['country'] to dummy then used important features then I have used regression but the results Not good. I have defined following functions to select features and missing values: def select_features

fill missing values (nan) by regression of other columns

烈酒焚心 提交于 2020-06-17 05:28:26
问题 I've got a dataset containing a lot of missing values (NAN). I want to use linear or multilinear regression in python and fill all the missing values. You can find the dataset here: Dataset I have used f_regression(X_train, Y_train) to select which feature should I use. first of all I convert df['country'] to dummy then used important features then I have used regression but the results Not good. I have defined following functions to select features and missing values: def select_features

eli5: show_weights() with two labels

这一生的挚爱 提交于 2020-06-13 06:00:31
问题 I'm trying eli5 in order to understand the contribution of terms to the prediction of certain classes. You can run this script: import numpy as np from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.datasets import fetch_20newsgroups #categories = ['alt.atheism', 'soc.religion.christian'] categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics'] np.random.seed(1) train

Weighted logistic regression in Python

孤人 提交于 2020-06-09 08:14:07
问题 I'm looking for a good implementation for logistic regression (not regularized) in Python. I'm looking for a package that can also get weights for each vector. Can anyone suggest a good implementation / package? Thanks! 回答1: I notice that this question is quite old now but hopefully this can help someone. With sklearn, you can use the SGDClassifier class to create a logistic regression model by simply passing in 'log' as the loss: sklearn.linear_model.SGDClassifier(loss='log', ...). This

package emmeans in R not returning effect sizes

空扰寡人 提交于 2020-05-28 06:12:14
问题 I'm following this tutorial as well as ?eff_size from package emmeans to compute eff_size() for my regression model below. But I get the error: need an object with call component from the eff_size() call. Am I missing something? library(lme4) library(emmeans) h <- read.csv('https://raw.githubusercontent.com/hkil/m/master/h.csv') h$year <- as.factor(h$year) m <- lmer(scale~year*group + (1|stid), data = h) ems <- emmeans(m, pairwise ~ group*year, infer = c(T, T)) eff_size(ems, sigma = sigma(m),

Why is bam from mgcv slow for some data?

删除回忆录丶 提交于 2020-05-25 18:39:18
问题 I am fitting the same Generalized Additive Model on multiple data sets using the bam function from mgcv . While for most of my data sets the fit completes within a reasonable time between 10 and 20 minutes. For a few data sets the run take more than 10 hours to complete. I cannot find any similarities between the slow cases, the final fit is neither exceptionally good nor bad, nor do they contain any noticeable outliers. How can I figure out why the fit is so slow for these instances? And how

Why is bam from mgcv slow for some data?

ぐ巨炮叔叔 提交于 2020-05-25 18:38:35
问题 I am fitting the same Generalized Additive Model on multiple data sets using the bam function from mgcv . While for most of my data sets the fit completes within a reasonable time between 10 and 20 minutes. For a few data sets the run take more than 10 hours to complete. I cannot find any similarities between the slow cases, the final fit is neither exceptionally good nor bad, nor do they contain any noticeable outliers. How can I figure out why the fit is so slow for these instances? And how

How to calculated the adjusted R2 value using scikit

好久不见. 提交于 2020-05-25 07:36:10
问题 I have a dataset for which I have to develop various models and compute the adjusted R2 value of all models. cv = KFold(n_splits=5,shuffle=True,random_state=45) r2 = make_scorer(r2_score) r2_val_score = cross_val_score(clf, x, y, cv=cv,scoring=r2) scores=[r2_val_score.mean()] return scores I have used the above code to calculate the R2 value of every model. But I am more interested to know the adjusted R2 value of every models Is there any package in python which can do the job? I will