问题
UPDATE: I apologize for providing a very simplified, non-reproducible example earlier. This is something that can be reproduced:
*use this case-specific dataset:
week units pr categ id avecenoz1 avecenoz2 avecenoz3
1667 1 0 1 371247 4.276693 4.871173 6.430658
1686 1 0 1 581457 4.372499 5.042025 6.45528
1656 1 0 2 217025 4.107188 4.900006 6.236501
1649 1 0 2 138704 4.355612 4.920326 6.548411
1685 1 0 3 575278 4.297557 4.971671 6.408175
1642 1 0 3 75440 4.290808 4.848145 6.384848
1655 1 0 3 204879 4.275114 4.905337 6.400794
1667 1 0 3 376364 4.276693 4.871173 6.430658
1671 1 1 3 426125 4.274153 5.001119 6.355516
id is unique customer id of a customer who made a purchase,
categ is product category the purchase belongs to,
week - which week the purchase took place,
avecenoz1 is an average price for category 1 during the specific week,
avecenoz2 is an average price for category 2 during the specific week,
avecenoz3 is an average price for category 3 during the specific week,
units always equal to 1,
and pr is whether the purchase was on promotion (1) or not (0).
*Scott Long's user-written code to transform variables from case-specific to alternative-specific:
case2alt, alt(avecenoz) case(id) choice(categ) altnum(mode)
*this is what you get:
id mode week units pr cater avecenoz choice y1 y2 y3
75440 1 1642 1 0 3 4.290808 0 1 0 0
75440 2 1642 1 0 3 4.848145 0 0 1 0
75440 3 1642 1 0 3 6.384848 1 0 0 1
138704 1 1649 1 0 2 4.355612 0 1 0 0
138704 2 1649 1 0 2 4.920326 1 0 1 0
138704 3 1649 1 0 2 6.548411 0 0 0 1
204879 1 1655 1 0 3 4.275114 0 1 0 0
204879 2 1655 1 0 3 4.905337 0 0 1 0
204879 3 1655 1 0 3 6.400794 1 0 0 1
217025 1 1656 1 0 2 4.107188 0 1 0 0
217025 2 1656 1 0 2 4.900006 1 0 1 0
217025 3 1656 1 0 2 6.236501 0 0 0 1
371247 1 1667 1 0 1 4.276693 1 1 0 0
371247 2 1667 1 0 1 4.871173 0 0 1 0
371247 3 1667 1 0 1 6.430658 0 0 0 1
376364 1 1667 1 0 3 4.276693 0 1 0 0
376364 2 1667 1 0 3 4.871173 0 0 1 0
376364 3 1667 1 0 3 6.430658 1 0 0 1
426125 1 1671 1 1 3 4.274153 0 1 0 0
426125 2 1671 1 1 3 5.001119 0 0 1 0
426125 3 1671 1 1 3 6.355516 1 0 0 1
575278 1 1685 1 0 3 4.297557 0 1 0 0
575278 2 1685 1 0 3 4.971671 0 0 1 0
575278 3 1685 1 0 3 6.408175 1 0 0 1
581457 1 1686 1 0 1 4.372499 1 1 0 0
581457 2 1686 1 0 1 5.042025 0 0 1 0
581457 3 1686 1 0 1 6.45528 0 0 0 1
As you can see, upon transformation pr was duplicated 3 times. However, for each replicated transaction, the indicator variable should be equal to one only for the item that sold and not the rest of the choices. Please help me prevent pr duplicating itself. Thank you!
Original message:
I am transforming a dataset from being case-specific to alternative-specific. The original dataset looks something like this:
id category week price1 price2 price3 pr
1 1 1 4.24 4.88 3.35 1
2 2 1 4.24 4.88 3.35 0
3 3 1 4.24 4.88 3.35 1
4 2 1 4.24 4.88 3.35 0
where:
id is unique customer id of a customer who made a purchase,
category is product category the purchase belongs to,
week - which week the purchase took place,
price1 is an average price for category 1 during the specific week,
price2 is an average price for category 2 during the specific week,
price3 is an average price for category 3 during the specific week,
and pr is whether the purchase was on promotion (1) or not (0).
How do I make sure that pr doesn't duplicate itself after the transformation?
By using the code
case2alt, alt(price) case(id) choice(category) altnum(mode)
this is what I get:
id mode week cater choice price y1 y2 y3 pr
1 1 1 1 1 4.24 1 0 0 1
1 2 1 1 0 4.88 0 1 0 1
1 3 1 1 0 3.35 0 0 1 1
2 1 1 1 0 4.24 1 0 0 0
2 2 1 1 1 4.88 0 1 0 0
2 3 1 1 0 3.35 0 0 1 0
3 1 1 1 0 4.24 1 0 0 1
3 2 1 1 0 4.88 0 1 0 1
3 3 1 1 1 3.35 0 0 1 1
4 1 1 1 0 4.24 1 0 0 0
4 2 1 1 1 4.88 0 1 0 0
4 3 1 1 0 3.35 0 0 1 0
Everything works well, except for pr. I don't want it to duplicate itself for all possible alternatives per each customer id. For each replicated transaction, the indicator variable should be equal to one only for the item that sold and not the rest of the choices:
id mode week cater choice price y1 y2 y3 pr
1 1 1 1 1 4.24 1 0 0 1
1 2 1 1 0 4.88 0 1 0 0
1 3 1 1 0 3.35 0 0 1 0
2 1 1 1 0 4.24 1 0 0 0
2 2 1 1 1 4.88 0 1 0 0
2 3 1 1 0 3.35 0 0 1 0
3 1 1 1 0 4.24 1 0 0 0
3 2 1 1 0 4.88 0 1 0 0
3 3 1 1 1 3.35 0 0 1 1
4 1 1 1 0 4.24 1 0 0 0
4 2 1 1 1 4.88 0 1 0 0
4 3 1 1 0 3.35 0 0 1 0
Is it possible to do?
Thank you a lot!
回答1:
My original answer using merge
should work fine. Below is an example.
clear all
set more off
*----- original data -----
input ///
id catchosen week pricea priceb pricec pr
1 1 1 4.24 4.88 3.35 1
2 2 1 4.24 4.88 3.35 0
3 3 1 4.24 4.88 3.35 1
4 2 1 4.24 4.88 3.35 0
end
list
* modify some things to do a -merge- later on
rename catchosen alt
rename pr pr2
* save this data in a temporary file
tempfile orig
save "`orig'"
*----- data that your command produces -----
clear all
input ///
id alt week cater choice price y1 y2 y3 pr
1 1 1 1 1 4.24 1 0 0 1
1 2 1 1 0 4.88 0 1 0 1
1 3 1 1 0 3.35 0 0 1 1
2 1 1 1 0 4.24 1 0 0 0
2 2 1 1 1 4.88 0 1 0 0
2 3 1 1 0 3.35 0 0 1 0
3 1 1 1 0 4.24 1 0 0 1
3 2 1 1 0 4.88 0 1 0 1
3 3 1 1 1 3.35 0 0 1 1
4 1 1 1 0 4.24 1 0 0 0
4 2 1 1 1 4.88 0 1 0 0
4 3 1 1 0 3.35 0 0 1 0
end
* merge this data with the original data. keep only -pr2-
merge 1:1 id alt using "`orig'", keepusing(pr2)
replace pr2 = 0 if missing(pr2)
* compare -pr- with -pr2-. the latter is what you want.
list, sepby(id)
But as Joe Canner pointed out in Statalist.org, a simple:
replace pr = 0 if choice == 0
after executing case2alt
, should also work and is much simpler.
来源:https://stackoverflow.com/questions/24433561/how-to-avoid-duplication-of-a-certain-variable-during-alternative-specific-data