Handling alternative-specific NA values in mlogit

◇◆丶佛笑我妖孽 提交于 2021-02-08 10:18:08

问题


It is common in mode choice models to have variables that vary with alternatives ("generic variables") but that are undefined for certain modes. For example, transit fare is present for bus and light rail, but undefined for automobiles and biking. Note that the fare is not zero.

I'm trying to make this work with the mlogit package for R. In this MWE I've asserted that price is undefined for fishing from the beach. This results in a singularity error.

library(mlogit)
#> Warning: package 'mlogit' was built under R version 3.5.2
#> Loading required package: Formula
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> Loading required package: lmtest

data("Fishing", package = "mlogit")
Fishing$price.beach <- NA
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
head(Fish)
#>            mode   income     alt   price  catch chid
#> 1.beach   FALSE 7083.332   beach      NA 0.0678    1
#> 1.boat    FALSE 7083.332    boat 157.930 0.2601    1
#> 1.charter  TRUE 7083.332 charter 182.930 0.5391    1
#> 1.pier    FALSE 7083.332    pier 157.930 0.0503    1
#> 2.beach   FALSE 1250.000   beach      NA 0.1049    2
#> 2.boat    FALSE 1250.000    boat  10.534 0.1574    2

mlogit(mode ~ catch + price | income, data = Fish, na.action = na.omit)
#> Error in solve.default(H, g[!fixed]): system is computationally singular: reciprocal condition number = 3.92205e-24

Created on 2019-07-08 by the reprex package (v0.2.1)

This happens when price is moved to the alternative-specific variable position as well. I think the issue may lie in the na.action function argument, but I can't find any documentation on this argument beyond the basic documentation tag:

na.action: a function which indicates what should happen when the data contains NAs

There appear to be no examples showing how this term is used differently and what the results are. There's a related unanswered question here.


回答1:


There appears to be a few things going on.

I am not quite sure how na.action = na.omit works under the hood, but it sounds to me like it will drop the entire row. I always find it better to do this explicitly.

When you drop the entire row, you will have choice occasions where no choice was made. This is not going to work. Remember, we are working with logit type probabilities. Furthermore, if no choice is made, no information is gained, so we need to drop these choice observations entirely. Doing these two steps in combination, I am able to run the model you propose.

Here is a commented working example:

library(mlogit)

# Read in the data
data("Fishing", package = "mlogit")

# Set price for the beach option to NA
Fishing$price.beach <- NA

# Scale income
Fishing$income <- Fishing$income / 10000

# Turn into 'mlogit' data
fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")

# Explicitly drop the alts with NA in price
fish <- fish[fish$alt != "beach", ]

# Dropping all NA also means that we now have choice occasions where no choice
# was made and we need to get rid of these as well
fish$choice_made <- rep(colSums(matrix(fish$mode, nrow = 3)), each = 3)

fish <- fish[fish$choice_made == 1, ]

fish <- mlogit.data(fish, shape = "long", alt.var = "alt", choice = "mode")

# Run an MNL model
mnl <- mlogit(mode ~ catch + price | income, data = fish)
summary(mnl)

In general, when working with these models, I find it very useful to always make all data transformations before running a model rather than rely on functions such as na.action.



来源:https://stackoverflow.com/questions/56939628/handling-alternative-specific-na-values-in-mlogit

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!