categorical-data | 易学教程

GBM multinomial distribution, how to use predict() to get predicted class?

阅读更多关于 GBM multinomial distribution, how to use predict() to get predicted class?

I am using the multinomial distribution from the gbm package in R. When I use the predict function, I get a series of values: 5.086328 -4.738346 -8.492738 -5.980720 -4.351102 -4.738044 -3.220387 -4.732654 but I want to get the probability of each class occurring. How do I recover the probabilities? Thank You. Take a look at ?predict.gbm , you'll see that there is a "type" parameter to the function. Try out predict(<gbm object>, <new data>, type="response") . smci predict.gbm(..., type='response') is not implemented for multinomial, or indeed any distribution other than bernoulli or poisson. So

How to apply custom column order (on Categorical) to pandas boxplot?

阅读更多关于 How to apply custom column order (on Categorical) to pandas boxplot?

EDIT: this question arose with pandas ~0.13 and was obsoleted by direct support somewhere between version 0.15-0.18 (as per @Cireo's late answer ) I can get a boxplot of a salary column in a pandas DataFrame... train.boxplot(column='Salary', by='Category', sym='') ...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion: category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys() How can I apply my custom column order to the boxplot columns? (other than ugly

geom_vline vertical line on x-axis with categorical data: ggplot2

阅读更多关于 geom_vline vertical line on x-axis with categorical data: ggplot2

I have data that is ordered in classes, as described in this article: https://www.r-bloggers.com/from-continuous-to-categorical/ This makes it easier to see which values are common. After creating those classes I want to create a barchart with the frequency of the different classes, which I do with the following exemplary code: set.seed(1) df.v <- data.frame(val = rnorm(1000, mean(4, sd=2))) df.v$val.clss <- cut(df.v$val, seq(min(df.v$val), max(df.v$val), 1)) p1 <- ggplot(data = df.v)+ geom_bar(aes(val.clss)) plot(p1) What I can not figure out, is how to add a vertical line exactly between the

How Tensorflow handles categorical features with multiple inputs within one column?

阅读更多关于 How Tensorflow handles categorical features with multiple inputs within one column?

For example, I have a data in the following csv format: csv col0 col1 col2 col3 1 A E|A|C 3 0 B D|F 2 2 C | 2 Each column seperated by comma represent one feature. Normally, a feature is one-hot(e.g. col0, col1, col3 ), but in this case, the feature for col2 has multiple inputs(seperated by |). I'm sure tensorflow can handle one-hot feature with sparse tensor, but I'm not sure whether it could handle features with multiple inputs like col2 ? How should it be represented in Tensorflow's sparse tensor? I am using the code below (but i don't know input method of col2 ) col0 = tf.feature_column

Consistent factor levels for same value over different datasets

阅读更多关于 Consistent factor levels for same value over different datasets

I'm not sure if I completely understand how factors work. So please correct me in an easy to understand way if I'm wrong. I always assumed that when doing regressions and what not, R behind the scenes concerts categorical variables into integers, but this part was outside of my train of thought. It would use the categorical values in a training set and after building a model, check for the same categorical value in the test dataset. Whatever the underlying 'levels' were - didnt matter to me. However, I've been thinking more... and need clarification - especially if I'm doing this wrong on how

How to apply custom column order (on Categorical) to pandas boxplot?

阅读更多关于 How to apply custom column order (on Categorical) to pandas boxplot?

问题 EDIT: this question arose with pandas ~0.13 and was obsoleted by direct support somewhere between version 0.15-0.18 (as per @Cireo's late answer) I can get a boxplot of a salary column in a pandas DataFrame... train.boxplot(column='Salary', by='Category', sym='') ...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion: category_order_by_mean_salary = train.groupby('Category')['Salary'].mean()

geom_vline vertical line on x-axis with categorical data: ggplot2

阅读更多关于 geom_vline vertical line on x-axis with categorical data: ggplot2

问题 I have data that is ordered in classes, as described in this article: https://www.r-bloggers.com/from-continuous-to-categorical/ This makes it easier to see which values are common. After creating those classes I want to create a barchart with the frequency of the different classes, which I do with the following exemplary code: set.seed(1) df.v <- data.frame(val = rnorm(1000, mean(4, sd=2))) df.v$val.clss <- cut(df.v$val, seq(min(df.v$val), max(df.v$val), 1)) p1 <- ggplot(data = df.v)+ geom

Mosaic plot with labels in each box showing a name and percentage of all observations

阅读更多关于 Mosaic plot with labels in each box showing a name and percentage of all observations

I would like to create a mosaic plot (R package vcd, see e.g. http://cran.r-project.org/web/packages/vcd/vignettes/residual-shadings.pdf ) with labels inside the plot. The labels should show either a combination of the various factors or some custom label and the percentage of total observations in this combination of categories (see e.g. http://i.usatoday.net/communitymanager/_photos/technology-live/2011/07/28/nielsen0728x-large.jpg , despite this not quite being a mosaic plot). I suspect something like the labeling_values function might play a role here, but I cannot quite get it to work.

Consistent factor levels for same value over different datasets

阅读更多关于 Consistent factor levels for same value over different datasets

问题 I'm not sure if I completely understand how factors work. So please correct me in an easy to understand way if I'm wrong. I always assumed that when doing regressions and what not, R behind the scenes concerts categorical variables into integers, but this part was outside of my train of thought. It would use the categorical values in a training set and after building a model, check for the same categorical value in the test dataset. Whatever the underlying 'levels' were - didnt matter to me.

How to generate pandas DataFrame column of Categorical from string column?

阅读更多关于 How to generate pandas DataFrame column of Categorical from string column?

I can convert a pandas string column to Categorical, but when I try to insert it as a new DataFrame column it seems to get converted right back to Series of str: train['LocationNFactor'] = pd.Categorical.from_array(train['LocationNormalized']) >>> type(pd.Categorical.from_array(train['LocationNormalized'])) <class 'pandas.core.categorical.Categorical'> # however it got converted back to... >>> type(train['LocationNFactor'][2]) <type 'str'> >>> train['LocationNFactor'][2] 'Hampshire' Guessing this is because Categorical doesn't map to any numpy dtype; so do I have to convert it to some int type