How to avoid duplication of a certain variable during alternative-specific data organization?

时间秒杀一切 提交于 2019-12-24 00:23:37

问题


UPDATE: I apologize for providing a very simplified, non-reproducible example earlier. This is something that can be reproduced:

*use this case-specific dataset:

week  units pr categ      id    avecenoz1   avecenoz2   avecenoz3
1667    1   0   1       371247  4.276693    4.871173    6.430658
1686    1   0   1       581457  4.372499    5.042025    6.45528
1656    1   0   2       217025  4.107188    4.900006    6.236501
1649    1   0   2       138704  4.355612    4.920326    6.548411
1685    1   0   3       575278  4.297557    4.971671    6.408175
1642    1   0   3       75440   4.290808    4.848145    6.384848
1655    1   0   3       204879  4.275114    4.905337    6.400794
1667    1   0   3       376364  4.276693    4.871173    6.430658
1671    1   1   3       426125  4.274153    5.001119    6.355516

id is unique customer id of a customer who made a purchase,

categ is product category the purchase belongs to,

week - which week the purchase took place,

avecenoz1 is an average price for category 1 during the specific week,

avecenoz2 is an average price for category 2 during the specific week,

avecenoz3 is an average price for category 3 during the specific week,

units always equal to 1,

and pr is whether the purchase was on promotion (1) or not (0).

*Scott Long's user-written code to transform variables from case-specific to alternative-specific:

case2alt, alt(avecenoz) case(id) choice(categ) altnum(mode)

*this is what you get:

 id    mode week units pr cater avecenoz choice y1  y2  y3
75440   1   1642    1   0   3   4.290808    0   1   0   0
75440   2   1642    1   0   3   4.848145    0   0   1   0
75440   3   1642    1   0   3   6.384848    1   0   0   1
138704  1   1649    1   0   2   4.355612    0   1   0   0
138704  2   1649    1   0   2   4.920326    1   0   1   0
 138704 3   1649    1   0   2   6.548411    0   0   0   1
204879  1   1655    1   0   3   4.275114    0   1   0   0
204879  2   1655    1   0   3   4.905337    0   0   1   0
204879  3   1655    1   0   3   6.400794    1   0   0   1
217025  1   1656    1   0   2   4.107188    0   1   0   0
217025  2   1656    1   0   2   4.900006    1   0   1   0
217025  3   1656    1   0   2   6.236501    0   0   0   1
371247  1   1667    1   0   1   4.276693    1   1   0   0
371247  2   1667    1   0   1   4.871173    0   0   1   0
371247  3   1667    1   0   1   6.430658    0   0   0   1
376364  1   1667    1   0   3   4.276693    0   1   0   0
376364  2   1667    1   0   3   4.871173    0   0   1   0
376364  3   1667    1   0   3   6.430658    1   0   0   1
426125  1   1671    1   1   3   4.274153    0   1   0   0
426125  2   1671    1   1   3   5.001119    0   0   1   0
426125  3   1671    1   1   3   6.355516    1   0   0   1
575278  1   1685    1   0   3   4.297557    0   1   0   0
575278  2   1685    1   0   3   4.971671    0   0   1   0
575278  3   1685    1   0   3   6.408175    1   0   0   1
581457  1   1686    1   0   1   4.372499    1   1   0   0
581457  2   1686    1   0   1   5.042025    0   0   1   0 
581457  3   1686    1   0   1   6.45528     0   0   0   1

As you can see, upon transformation pr was duplicated 3 times. However, for each replicated transaction, the indicator variable should be equal to one only for the item that sold and not the rest of the choices. Please help me prevent pr duplicating itself. Thank you!


Original message:

I am transforming a dataset from being case-specific to alternative-specific. The original dataset looks something like this:

id   category   week   price1   price2    price3     pr

 1       1        1     4.24     4.88     3.35       1
 2       2        1     4.24     4.88     3.35       0
 3       3        1     4.24     4.88     3.35       1
 4       2        1     4.24     4.88     3.35       0

where:

id is unique customer id of a customer who made a purchase,

category is product category the purchase belongs to,

week - which week the purchase took place,

price1 is an average price for category 1 during the specific week,

price2 is an average price for category 2 during the specific week,

price3 is an average price for category 3 during the specific week,

and pr is whether the purchase was on promotion (1) or not (0).

How do I make sure that pr doesn't duplicate itself after the transformation?

By using the code

case2alt, alt(price) case(id) choice(category) altnum(mode)

this is what I get:

id   mode   week  cater  choice  price   y1    y2    y3    pr

 1     1     1      1       1    4.24     1     0     0    1
 1     2     1      1       0    4.88     0     1     0    1
 1     3     1      1       0    3.35     0     0     1    1
 2     1     1      1       0    4.24     1     0     0    0
 2     2     1      1       1    4.88     0     1     0    0
 2     3     1      1       0    3.35     0     0     1    0
 3     1     1      1       0    4.24     1     0     0    1
 3     2     1      1       0    4.88     0     1     0    1
 3     3     1      1       1    3.35     0     0     1    1
 4     1     1      1       0    4.24     1     0     0    0
 4     2     1      1       1    4.88     0     1     0    0
 4     3     1      1       0    3.35     0     0     1    0

Everything works well, except for pr. I don't want it to duplicate itself for all possible alternatives per each customer id. For each replicated transaction, the indicator variable should be equal to one only for the item that sold and not the rest of the choices:

id   mode   week  cater  choice  price   y1    y2    y3    pr

 1     1     1      1       1    4.24     1     0     0    1
 1     2     1      1       0    4.88     0     1     0    0
 1     3     1      1       0    3.35     0     0     1    0
 2     1     1      1       0    4.24     1     0     0    0
 2     2     1      1       1    4.88     0     1     0    0
 2     3     1      1       0    3.35     0     0     1    0
 3     1     1      1       0    4.24     1     0     0    0
 3     2     1      1       0    4.88     0     1     0    0
 3     3     1      1       1    3.35     0     0     1    1
 4     1     1      1       0    4.24     1     0     0    0
 4     2     1      1       1    4.88     0     1     0    0
 4     3     1      1       0    3.35     0     0     1    0

Is it possible to do?

Thank you a lot!


回答1:


My original answer using merge should work fine. Below is an example.

clear all
set more off

*----- original data -----

input ///
id   catchosen   week   pricea   priceb    pricec     pr
 1       1        1     4.24     4.88     3.35       1
 2       2        1     4.24     4.88     3.35       0
 3       3        1     4.24     4.88     3.35       1
 4       2        1     4.24     4.88     3.35       0
end

list

* modify some things to do a -merge- later on
rename catchosen alt
rename pr pr2

* save this data in a temporary file
tempfile orig
save "`orig'"


*----- data that your command produces -----

clear all

input ///
id   alt   week  cater  choice  price   y1    y2    y3    pr
 1     1     1      1       1    4.24     1     0     0    1
 1     2     1      1       0    4.88     0     1     0    1
 1     3     1      1       0    3.35     0     0     1    1
 2     1     1      1       0    4.24     1     0     0    0
 2     2     1      1       1    4.88     0     1     0    0
 2     3     1      1       0    3.35     0     0     1    0
 3     1     1      1       0    4.24     1     0     0    1
 3     2     1      1       0    4.88     0     1     0    1
 3     3     1      1       1    3.35     0     0     1    1
 4     1     1      1       0    4.24     1     0     0    0
 4     2     1      1       1    4.88     0     1     0    0
 4     3     1      1       0    3.35     0     0     1    0
end

* merge this data with the original data. keep only -pr2-
merge 1:1 id alt using "`orig'", keepusing(pr2)
replace pr2 = 0 if missing(pr2)

* compare -pr- with -pr2-. the latter is what you want.
list, sepby(id)  

But as Joe Canner pointed out in Statalist.org, a simple:

replace pr = 0 if choice == 0

after executing case2alt, should also work and is much simpler.



来源:https://stackoverflow.com/questions/24433561/how-to-avoid-duplication-of-a-certain-variable-during-alternative-specific-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!