categorical-data

Pandas: Convert lists within a single column to multiple columns

有些话、适合烂在心里 提交于 2019-12-05 11:32:36
I have a dataframe that includes columns with multiple attributes separated by commas: df = pd.DataFrame({'id': [1,2,3], 'labels' : ["a,b,c", "c,a", "d,a,b"]}) id labels 0 1 a,b,c 1 2 c,a 2 3 d,a,b (I know this isn't an ideal situation, but the data originates from an external source.) I want to turn the multi-attribute columns into multiple columns, one for each label, so that I can treat them as categorical variables. Desired output: id a b c d 0 1 True True True False 1 2 True False True False 2 3 True True False True I can get the set of all possible attributes ( [a,b,c,d] ) fairly easily,

How do I make a boxplot with two categorical variables in R? [closed]

耗尽温柔 提交于 2019-12-04 21:18:02
Closed . This question needs details or clarity . It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post . Closed 5 years ago . I would like to make a boxplot that shows how time spent doing a behaviour(Alert) is affected by two variables (Period= Morning/Afternoon and Visitor Level= High/Low). Alert ~ Period + Vis.Level 'Alert' is a set of 12 numbers that show the amount of time spent awake with the other two as the significant categorical variables. I have looked at other examples but none seem to fit this type of

R coxph() warning: Loglik converged before variable

百般思念 提交于 2019-12-04 20:33:28
问题 I'm having some trouble using coxph(). I've two categorical variables: Sex and Probable Cause, that I want to use as predictor variables. Sex is just the typical male/female but Probable Cause has 5 options. I don't know what is the problem with the warning message. Why does the cofidence intervals are from 0 to Inf and the p-values so high? Here's the code and the output: > my_coxph <- coxph(Surv(tempo,status) ~ factor(Sexo)+ factor(Causa.provavel) , data=ceabn) Warning message: In fitter(X,

SQL query to get the subtotal of some rows

会有一股神秘感。 提交于 2019-12-04 17:18:47
What would be the SQL query script if I want to get the total items and total revenue for each manager including his team? Suppose I have this table items_revenue with columns: | id |is_manager|manager_id| name |no_of_items| revenue | | 1 | 1 | 0 | Manager1 | 621 | 833 | | 2 | 1 | 0 | Manager2 | 458 | 627 | | 3 | 1 | 0 | Manager3 | 872 | 1027 | ... | 8 | 0 | 1 | Member1 | 1258 | 1582 | | 9 | 0 | 2 | Member2 | 5340 | 8827 | | 10 | 0 | 3 | Member3 | 3259 | 5124 | All the managers and their respective members are in the above view table. Member1 is under Manager1, Member2 is under Manager2, and

categorical variable in logistic regression in r

ぐ巨炮叔叔 提交于 2019-12-04 14:37:24
问题 how I have to implement a categorical variable in a binary logistic regression in R? I want to test the influence of the professional fields (student, worker, teacher, self-employed) on the probability of a purchase of a product. In my example y is a binary variable (1 for buying a product, 0 for not buying). - x1: is the gender (0 male, 1 female) - x2: is the age (between 20 and 80) - x3: is the categorical variable (1=student, 2=worker, 3=teacher, 4=self-employed) set.seed(123) y<-round

Tensorflow embedding lookup with unequal sized lists

蓝咒 提交于 2019-12-04 07:47:45
Hej guys, I'm trying to project multi labeled categorical data into a dense space using embeddings. Here's an toy example. Let's say I have four categories and want to project them into a 2D space. Furthermore I got two instances, the first one belonging to category 0 and the second one to category 1. The code will look something like this: sess = tf.InteractiveSession() embeddings = tf.Variable(tf.random_uniform([4, 2], -1.0, 1.0)) sess.run(tf.global_variables_initializer()) y = tf.nn.embedding_lookup(embeddings, [0,1]) y.eval() and return something like this: array([[ 0.93999457, -0.83051205

XGBoost Categorical Variables: Dummification vs encoding

♀尐吖头ヾ 提交于 2019-12-04 07:35:13
问题 When using XGBoost we need to convert categorical variables into numeric. Would there be any difference in performance/evaluation metrics between the methods of: dummifying your categorical variables encoding your categorical variables from e.g. (a,b,c) to (1,2,3) ALSO: Would there be any reasons not to go with method 2 by using for example labelencoder ? 回答1: xgboost only deals with numeric columns. if you have a feature [a,b,b,c] which describes a categorical variable ( i.e. no numeric

How to sort pandas dataframe by custom order on string index

依然范特西╮ 提交于 2019-12-04 06:48:34
I have the following data frame: import pandas as pd # Create DataFrame df = pd.DataFrame( {'id':[2967, 5335, 13950, 6141, 6169],\ 'Player': ['Cedric Hunter', 'Maurice Baker' ,\ 'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],\ 'Year': [1991 ,2004 ,2001 ,2009 ,1997],\ 'Age': [27 ,25 ,22 ,34 ,31],\ 'Tm':['CHH' ,'VAN' ,'TOT' ,'OKC' ,'DAL'],\ 'G':[6 ,7 ,60 ,52 ,81]}) df.set_index('Player', inplace=True) It shows: Out[128]: Age G Tm Year id Player Cedric Hunter 27 6 CHH 1991 2967 Maurice Baker 25 7 VAN 2004 5335 Ratko Varda 22 60 TOT 2001 13950 Ryan Bowen 34 52 OKC 2009 6141 Adrian Caldwell 31 81

How to manually set colours to a categorical variables using ggplot()? [duplicate]

梦想的初衷 提交于 2019-12-04 06:06:02
问题 This question already has an answer here : Manually setting group colors for ggplot2 (1 answer) Closed 4 years ago . This is my sample data table1 xaxis yaxis ae work 1 5 35736 Attending_Education Working 2 6 72286 Attending_Education Working 3 7 133316 Attending_Education Working 4 8 252520 Attending_Education Working 5 9 228964 Attending_Education Working 6 10 504676 Attending_Education Working This is the code i had used. p<-ggplot(table1,aes(x=table1$xaxis,y=table1$yaxis)) Economic

How to plot parallel coordinates with multiple categorical variables in R

天涯浪子 提交于 2019-12-04 03:21:39
I am facing a difficulty while plotting a parallel coordinates plot using the ggparcoord from the GGally package. As there are two categorical variables, what I want to show in the visualisation is like the image below. I've found that in ggparcoord , groupColumn is only allowed to a single variable to group (colour) by, and surely I can use showPoints to mark the values on the axes, but i also need to vary the shape of these markers according to the categorical variables. Is there other package that can help me to realise my idea? Any response will be appreciated! Thanks! It's not that