categorical-data

GBM multinomial distribution, how to use predict() to get predicted class?

拟墨画扇 提交于 2019-12-01 03:46:22
I am using the multinomial distribution from the gbm package in R. When I use the predict function, I get a series of values: 5.086328 -4.738346 -8.492738 -5.980720 -4.351102 -4.738044 -3.220387 -4.732654 but I want to get the probability of each class occurring. How do I recover the probabilities? Thank You. Take a look at ?predict.gbm , you'll see that there is a "type" parameter to the function. Try out predict(<gbm object>, <new data>, type="response") . smci predict.gbm(..., type='response') is not implemented for multinomial, or indeed any distribution other than bernoulli or poisson. So

How to apply custom column order (on Categorical) to pandas boxplot?

谁都会走 提交于 2019-12-01 03:21:41
EDIT: this question arose with pandas ~0.13 and was obsoleted by direct support somewhere between version 0.15-0.18 (as per @Cireo's late answer ) I can get a boxplot of a salary column in a pandas DataFrame... train.boxplot(column='Salary', by='Category', sym='') ...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion: category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys() How can I apply my custom column order to the boxplot columns? (other than ugly

geom_vline vertical line on x-axis with categorical data: ggplot2

若如初见. 提交于 2019-12-01 01:57:07
I have data that is ordered in classes, as described in this article: https://www.r-bloggers.com/from-continuous-to-categorical/ This makes it easier to see which values are common. After creating those classes I want to create a barchart with the frequency of the different classes, which I do with the following exemplary code: set.seed(1) df.v <- data.frame(val = rnorm(1000, mean(4, sd=2))) df.v$val.clss <- cut(df.v$val, seq(min(df.v$val), max(df.v$val), 1)) p1 <- ggplot(data = df.v)+ geom_bar(aes(val.clss)) plot(p1) What I can not figure out, is how to add a vertical line exactly between the

How Tensorflow handles categorical features with multiple inputs within one column?

半世苍凉 提交于 2019-12-01 01:03:45
For example, I have a data in the following csv format: csv col0 col1 col2 col3 1 A E|A|C 3 0 B D|F 2 2 C | 2 Each column seperated by comma represent one feature. Normally, a feature is one-hot(e.g. col0, col1, col3 ), but in this case, the feature for col2 has multiple inputs(seperated by |). I'm sure tensorflow can handle one-hot feature with sparse tensor, but I'm not sure whether it could handle features with multiple inputs like col2 ? How should it be represented in Tensorflow's sparse tensor? I am using the code below (but i don't know input method of col2 ) col0 = tf.feature_column

Consistent factor levels for same value over different datasets

孤街醉人 提交于 2019-12-01 00:30:29
I'm not sure if I completely understand how factors work. So please correct me in an easy to understand way if I'm wrong. I always assumed that when doing regressions and what not, R behind the scenes concerts categorical variables into integers, but this part was outside of my train of thought. It would use the categorical values in a training set and after building a model, check for the same categorical value in the test dataset. Whatever the underlying 'levels' were - didnt matter to me. However, I've been thinking more... and need clarification - especially if I'm doing this wrong on how

How to apply custom column order (on Categorical) to pandas boxplot?

 ̄綄美尐妖づ 提交于 2019-11-30 23:37:36
问题 EDIT: this question arose with pandas ~0.13 and was obsoleted by direct support somewhere between version 0.15-0.18 (as per @Cireo's late answer) I can get a boxplot of a salary column in a pandas DataFrame... train.boxplot(column='Salary', by='Category', sym='') ...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion: category_order_by_mean_salary = train.groupby('Category')['Salary'].mean()

geom_vline vertical line on x-axis with categorical data: ggplot2

戏子无情 提交于 2019-11-30 21:06:30
问题 I have data that is ordered in classes, as described in this article: https://www.r-bloggers.com/from-continuous-to-categorical/ This makes it easier to see which values are common. After creating those classes I want to create a barchart with the frequency of the different classes, which I do with the following exemplary code: set.seed(1) df.v <- data.frame(val = rnorm(1000, mean(4, sd=2))) df.v$val.clss <- cut(df.v$val, seq(min(df.v$val), max(df.v$val), 1)) p1 <- ggplot(data = df.v)+ geom

Mosaic plot with labels in each box showing a name and percentage of all observations

独自空忆成欢 提交于 2019-11-30 21:05:23
I would like to create a mosaic plot (R package vcd, see e.g. http://cran.r-project.org/web/packages/vcd/vignettes/residual-shadings.pdf ) with labels inside the plot. The labels should show either a combination of the various factors or some custom label and the percentage of total observations in this combination of categories (see e.g. http://i.usatoday.net/communitymanager/_photos/technology-live/2011/07/28/nielsen0728x-large.jpg , despite this not quite being a mosaic plot). I suspect something like the labeling_values function might play a role here, but I cannot quite get it to work.

Consistent factor levels for same value over different datasets

百般思念 提交于 2019-11-30 19:32:48
问题 I'm not sure if I completely understand how factors work. So please correct me in an easy to understand way if I'm wrong. I always assumed that when doing regressions and what not, R behind the scenes concerts categorical variables into integers, but this part was outside of my train of thought. It would use the categorical values in a training set and after building a model, check for the same categorical value in the test dataset. Whatever the underlying 'levels' were - didnt matter to me.

How to generate pandas DataFrame column of Categorical from string column?

笑着哭i 提交于 2019-11-30 19:22:09
I can convert a pandas string column to Categorical, but when I try to insert it as a new DataFrame column it seems to get converted right back to Series of str: train['LocationNFactor'] = pd.Categorical.from_array(train['LocationNormalized']) >>> type(pd.Categorical.from_array(train['LocationNormalized'])) <class 'pandas.core.categorical.Categorical'> # however it got converted back to... >>> type(train['LocationNFactor'][2]) <type 'str'> >>> train['LocationNFactor'][2] 'Hampshire' Guessing this is because Categorical doesn't map to any numpy dtype; so do I have to convert it to some int type