categorical-data

Handling NULL values in Spark StringIndexer

不想你离开。 提交于 2019-12-10 10:08:40
问题 I have a dataset with some categorical string columns and I want to represent them in double type. I used StringIndexer for this convertion and It works but when I tried it in another dataset that has NULL values it gave java.lang.NullPointerException error and did not work. For better understanding here is my code: for(col <- cols){ out_name = col ++ "_" var indexer = new StringIndexer().setInputCol(col).setOutputCol(out_name) var indexed = indexer.fit(df).transform(df) df = (indexed

Automatically use LRT to assess significance of entire factor variable

别说谁变了你拦得住时间么 提交于 2019-12-10 09:43:15
问题 R's output for a multivariable regression model including one or more factor variable does not automatically include a likelihood ratio test (LRT) of the significance of the entire factor variable in the model. For example: fake = data.frame( x1=rnorm(100), x2=sample(LETTERS[1:4], size=100, replace=TRUE), y=rnorm(100) ) head(fake) x1 x2 y 1 0.6152511 A 0.7682467 2 -0.8215727 A -0.5389245 3 -1.3287208 A -0.1797851 4 0.5837217 D 0.9509888 5 -0.2828024 C -0.9829126 6 0.3971358 B -0.4895091 m =

Tensorflow embedding for categorical feature

倖福魔咒の 提交于 2019-12-10 08:03:53
问题 In machine learning, it is common to represent a categorical (specifically: nominal) feature with one-hot-encoding. I am trying to learn how to use tensorflow's embedding layer to represent a categorical feature in a classification problem. I have got tensorflow version 1.01 installed and I am using Python 3.6 . I am aware of the tensorflow tutorial for word2vec, but it is not very instructive for my case. While building the tf.Graph , it uses NCE-specific weights and tf.nn.nce_loss . I just

How to legend a raster using directly the raster attribute table and displaying the legend only for class displayed in the raster?

孤街浪徒 提交于 2019-12-10 05:48:02
问题 I would like to use the raster attribute table information to create the legend of a raster such as the raster 1 and display the legend only for the class displayed in the raster. I build an example to explain what I would like to get. 1/ Build the raster r <- raster(ncol=10, nrow=10) values(r) <-sample(1:3,ncell(r),replace=T) 2/ Add the Raster Attribute Table r <- ratify(r) # build the Raster Attibute table rat <- levels(r)[[1]]#get the values of the unique cell frot the attribute table rat

How do I make a boxplot with two categorical variables in R? [closed]

不问归期 提交于 2019-12-10 00:10:34
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 5 years ago . I would like to make a boxplot that shows how time spent doing a behaviour(Alert) is affected by two variables (Period= Morning/Afternoon and Visitor Level= High/Low). Alert ~ Period + Vis.Level 'Alert' is a set of 12 numbers that show the amount of time spent awake with the other two as the

“Automatically” calculate linear combination of parameter estimates with PROC GLM

自闭症网瘾萝莉.ら 提交于 2019-12-08 01:41:07
问题 Background : I have a categorical variable, X , with four levels that I fit as separate dummy variables. Thus, there are three total dummy variables representing x=1, x=2, x=3 (x=0 is baseline). Problem/issue : I want to be able to calculate the value of a linear combination (i.e. using SAS as a calculator) of these dummy variables. For example, 2*B1 + 2*B2 + B3. In Stata, this can be done using the lincom command, which uses the stored beta estimates to calculate linear combinations of the

Automatically compare nested models from mice's glm.mids

▼魔方 西西 提交于 2019-12-08 00:40:49
问题 I have a multiply-imputed model from R's mice package in which there are lots of factor variables. For example: library(mice) library(Hmisc) # turn all the variables into factors fake = nhanes fake$age = as.factor(nhanes$age) fake$bmi = cut2(nhanes$bmi, g=3) fake$chl = cut2(nhanes$chl, g=3) head(fake) age bmi hyp chl 1 1 <NA> NA <NA> 2 2 [20.4,25.5) 1 [187,206) 3 1 <NA> 1 [187,206) 4 3 <NA> NA <NA> 5 1 [20.4,25.5) 1 [113,187) 6 3 <NA> NA [113,187) imput = mice(nhanes) # big model fit1 = glm

Plotting two categorical arrays in a histogram/bar chart?

半城伤御伤魂 提交于 2019-12-08 00:27:17
问题 I have a categorical array, race, and an array of yes/no, and I want to somehow create a stacked bar/histogram plot with each race having its own bar and each bar is broken up into two different colors - one for the respondents that said yes, and the others for the ones that said no. Is there any way to do this relatively simply in MATLAB? And is there a way at least create a table that shows for each race, how many said yes, how many said no? To clarify, there are 1250 rows in my data set,

Lexical dispersion plot is seaborn

两盒软妹~` 提交于 2019-12-07 18:06:44
问题 I am using the seaborn module to produce a plot similar to the example below. import pandas as pd import matplotlib.pyplot as plt import numpy as np import seaborn as sns location = "/global/scratch/umalmonj/WRF/juris/golden_hourly_manual_obs.csv" df = pd.read_csv(location,usecols= ["Year","Month","Day","Time","Weather"],parse_dates=[["Year","Month","Day","Time"]]) I have a df that looks like: Year_Month_Day_Time Weather 0 2010-01-01 00:00:00 NaN 1 2010-01-01 01:00:00 NaN 2 2010-01-01 02:00

Meaning of “trait” in MCMCglmm

你。 提交于 2019-12-07 12:53:36
问题 Like in this post I'm struggling with the notation of MCMCglmm , especially what is meant by trait . My code ist the following library("MCMCglmm") set.seed(123) y <- sample(letters[1:3], size = 100, replace = TRUE) x <- rnorm(100) id <- rep(1:10, each = 10) dat <- data.frame(y, x, id) mod <- MCMCglmm(fixed = y ~ x, random = ~us(x):id, data = dat, family = "categorical") Which gives me the error message For error structures involving catgeorical data with more than 2 categories pleasue use