How to generate random data set with predicted probability?

问题

I'm struggling to generate random data set with predicted probability of multinomial logistic regression.

Let's take an example. I'll use nnet package for multinomial logistic regression. I will also use wine data set in rattle.data package.

library("nnet")
library("rattle.data")
data(wine)
multinom.fit<-multinom(Type~Alcohol+Color,data=wine)
summary(multinom.fit)

Call:
multinom(formula = Type ~ Alcohol + Color - 1, data = wine)

Coefficients:
     Alcohol      Color
2  0.6258035 -1.9480658
3 -0.3457799  0.6944604

Std. Errors:
     Alcohol     Color
2 0.10203198 0.3204171
3 0.07042968 0.1479679

Residual Deviance: 222.5608 
AIC: 230.5608 

fit<-fitted(multinom.fit)
head(fit)

          1            2          3
1 0.6705935 0.0836177621 0.24578870
2 0.5050334 0.3847919037 0.11017466
3 0.6232029 0.0367975986 0.33999948
4 0.3895445 0.0007888818 0.60966664
5 0.4797392 0.4212542898 0.09900655
6 0.5510792 0.0077589278 0.44116190

So, the fit dataset is 178*3 dataframe. I want to generated 100 random dataset, using predicted probability. For example, the first sample in fit dataset has about 0.67 probability to be '1' and 0.08 to '2', 0.24 to '3'. Each sample was recruited(collected?) independently.

Is there a way to perform it?

回答1:

You could try:

rand.list <- lapply(1:nrow(fit), function(x) sample(1:3, 100, replace = TRUE, prob = fit[x, ]))
rand.df   <- data.frame(matrix(unlist(rand.list), ncol = nrow(fit)))

It will give you a data.frame with 100 observations and 178 columns with the different sampling probabilities of each row in fit.

回答2:

I'm sorry for misconveying my words.

For example, When I execute your code, the results turn out like this.

head(lapply(1:nrow(fit), function(x) sample(1:3, 100, replace = TRUE, prob = fit[x, ])))
[[1]]
  [1] 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
 [61] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1

[[2]]
  [1] 2 3 2 2 1 3 2 1 3 1 1 1 2 1 1 1 3 1 3 1 1 2 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 3 2 1 2 1 1 2 2 3 2 3 1 1 2 1 1 3 1 3 1
 [61] 2 1 2 1 3 1 1 1 2 3 3 1 1 3 1 3 1 1 1 1 1 1 1 1 2 3 3 2 1 1 2 1 2 1 3 3 1 1 1 2

[[3]]
  [1] 1 3 1 1 1 1 1 1 1 3 3 3 3 3 1 1 3 3 3 3 1 3 1 3 2 3 1 1 3 3 3 2 1 3 2 3 1 3 3 3 3 3 1 1 1 1 1 1 1 3 3 3 1 1 2 1 3 1 1 3
 [61] 3 3 3 3 1 1 1 3 3 3 3 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3

[[4]]
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3 1 1 1 1 1 1 1
 [61] 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1

[[5]]
  [1] 1 3 2 1 1 1 1 1 3 2 1 2 1 2 1 1 1 3 3 3 1 2 2 3 1 1 2 1 2 1 3 3 1 1 3 3 2 3 2 1 1 2 2 1 1 1 1 1 1 2 1 3 3 1 2 2 3 1 1 1
 [61] 1 1 1 2 1 2 1 1 3 3 1 1 2 1 1 1 2 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 3 1 1 1 1 3

[[6]]
  [1] 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1
 [61] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

However, is there another way to express this in data.frame? when i execute data.frame function, it turn out like this.

head(data.frame(lapply(1:nrow(fit), function(x) sample(1:3, 100, replace = TRUE, prob = fit[x, ]))))

*Though executing head function, the data were to long. I copied the last two rows.

  c.3L..1L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..
1                                                              3
2                                                              1
3                                                              3
4                                                              3
5                                                              3
  c.3L..1L..1L..1L..3L..3L..3L..1L..1L..1L..3L..1L..1L..3L..1L..
1                                                              3
2                                                              1
3                                                              1
4                                                              1
5                                                              3
 [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

I want to express the data like this.

   1 2 3 4 5 .... (ommited)
1 1 1 3 1 1
2 1 1 3 1 1
3 1 3 3 1 1
4 1 3 1 1 3
5 1 1 3 1 1
... (omited)

So, the data.frame is 178*100. 178 is the number of sample, and 100 is random generate trial number.

来源：https://stackoverflow.com/questions/56866810/how-to-generate-random-data-set-with-predicted-probability

标签

simulation

prediction

categorical-data