问题
I'm struggling to generate random data set with predicted probability of multinomial logistic regression.
Let's take an example. I'll use nnet
package for multinomial logistic regression. I will also use wine
data set in rattle.data
package.
library("nnet")
library("rattle.data")
data(wine)
multinom.fit<-multinom(Type~Alcohol+Color,data=wine)
summary(multinom.fit)
Call:
multinom(formula = Type ~ Alcohol + Color - 1, data = wine)
Coefficients:
Alcohol Color
2 0.6258035 -1.9480658
3 -0.3457799 0.6944604
Std. Errors:
Alcohol Color
2 0.10203198 0.3204171
3 0.07042968 0.1479679
Residual Deviance: 222.5608
AIC: 230.5608
fit<-fitted(multinom.fit)
head(fit)
1 2 3
1 0.6705935 0.0836177621 0.24578870
2 0.5050334 0.3847919037 0.11017466
3 0.6232029 0.0367975986 0.33999948
4 0.3895445 0.0007888818 0.60966664
5 0.4797392 0.4212542898 0.09900655
6 0.5510792 0.0077589278 0.44116190
So, the fit
dataset is 178*3 dataframe. I want to generated 100 random dataset, using predicted probability. For example, the first sample in fit
dataset has about 0.67 probability to be '1' and 0.08 to '2', 0.24 to '3'. Each sample was recruited(collected?) independently.
Is there a way to perform it?
回答1:
You could try:
rand.list <- lapply(1:nrow(fit), function(x) sample(1:3, 100, replace = TRUE, prob = fit[x, ]))
rand.df <- data.frame(matrix(unlist(rand.list), ncol = nrow(fit)))
It will give you a data.frame with 100 observations and 178 columns with the different sampling probabilities of each row in fit
.
回答2:
I'm sorry for misconveying my words.
For example, When I execute your code, the results turn out like this.
head(lapply(1:nrow(fit), function(x) sample(1:3, 100, replace = TRUE, prob = fit[x, ])))
[[1]]
[1] 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
[61] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1
[[2]]
[1] 2 3 2 2 1 3 2 1 3 1 1 1 2 1 1 1 3 1 3 1 1 2 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 3 2 1 2 1 1 2 2 3 2 3 1 1 2 1 1 3 1 3 1
[61] 2 1 2 1 3 1 1 1 2 3 3 1 1 3 1 3 1 1 1 1 1 1 1 1 2 3 3 2 1 1 2 1 2 1 3 3 1 1 1 2
[[3]]
[1] 1 3 1 1 1 1 1 1 1 3 3 3 3 3 1 1 3 3 3 3 1 3 1 3 2 3 1 1 3 3 3 2 1 3 2 3 1 3 3 3 3 3 1 1 1 1 1 1 1 3 3 3 1 1 2 1 3 1 1 3
[61] 3 3 3 3 1 1 1 3 3 3 3 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 3 3 1 3 3 3 3 3 3 3 3 3 3
[[4]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3 1 1 1 1 1 1 1
[61] 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1
[[5]]
[1] 1 3 2 1 1 1 1 1 3 2 1 2 1 2 1 1 1 3 3 3 1 2 2 3 1 1 2 1 2 1 3 3 1 1 3 3 2 3 2 1 1 2 2 1 1 1 1 1 1 2 1 3 3 1 2 2 3 1 1 1
[61] 1 1 1 2 1 2 1 1 3 3 1 1 2 1 1 1 2 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 3 1 1 1 1 3
[[6]]
[1] 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1
[61] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
However, is there another way to express this in data.frame? when i execute data.frame function, it turn out like this.
head(data.frame(lapply(1:nrow(fit), function(x) sample(1:3, 100, replace = TRUE, prob = fit[x, ]))))
*Though executing head function, the data were to long. I copied the last two rows.
c.3L..1L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..3L..
1 3
2 1
3 3
4 3
5 3
c.3L..1L..1L..1L..3L..3L..3L..1L..1L..1L..3L..1L..1L..3L..1L..
1 3
2 1
3 1
4 1
5 3
[ reached 'max' / getOption("max.print") -- omitted 1 rows ]
I want to express the data like this.
1 2 3 4 5 .... (ommited)
1 1 1 3 1 1
2 1 1 3 1 1
3 1 3 3 1 1
4 1 3 1 1 3
5 1 1 3 1 1
... (omited)
So, the data.frame is 178*100. 178 is the number of sample, and 100 is random generate trial number.
来源:https://stackoverflow.com/questions/56866810/how-to-generate-random-data-set-with-predicted-probability