问题
1) A new variable should be created for each unique observation listed in variable sku
, which contains repeated values.
2) These newly created variables should be assigned the value of own product's price
at the store/week level, as long as observations' sku
value is in the same subcategory (subc
) as the variable itself. For example, in eta2,3,
observations in line 3, 4, and 5 have the same value because they all belong to the same subcategory as sku #3. [eta2,3
indicates sku 3, subc 2.]
3) x
indicates that this is the original value for the product/subcategory that is currently being replicated.
4) If an observation doesn't belong to the same subcategory, it should reflect "0".
Orange is the given data. In green are the values from the steps 1, 2, and 3. White cells are step 4.
I am unable to offer a solution of my own, as searching for a way to generate a variable using existing observations hasn't given me results.
I also understand that it must be a combination of forvalues
, foreach
, and levelsof
commands?
clear
input units price sku week store subc
3 4.3 1 1 1 1
2 3 2 1 1 1
1 2.5 3 1 1 2
4 12 5 1 1 2
5 12 6 1 1 3
35 4.3 1 1 2 1
23 3 2 1 2 1
12 2.5 3 1 2 2
35 12 5 1 2 2
35 12 6 1 2 3
3 20 1 2 1 1
2 30 2 2 1 1
4 40 3 2 2 2
1 50 4 2 2 2
9 10 5 2 2 2
2 90 6 2 2 3
end
UPDATE Based on Nick Cox' feedback, this is the final code that gives the result I have been looking for:
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
35 5.3 1 2 1 1
23 4 2 2 1 1
12 3.5 3 2 1 2
10 2 4 2 1 2
35 13 5 2 1 2
35 13 6 2 1 3
end
egen joint = group(subc sku), label
bysort store week : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
tostring subc sku, replace
gen new = subc + "_"+sku
su joint, meanonly
forval j = 1/`r(max)'{
local J = new[`j']
gen eta`J' = .
}
sort subc week store sku
egen joint1 = group(subc week store), label
gen long id = _n
su joint1, meanonly
quietly forval i = 1/`r(max)' {
su id if joint1 == `i', meanonly
local jmin = r(min)
local jmax = r(max)
forval j = `jmin'/`jmax' {
local subc = subc[`j']
local sku = sku[`j']
replace eta`subc'_`sku' = price[`j'] in `jmin'/`jmax'
replace eta`subc'_`sku' = 0 in `j'/`j'
}
}
回答1:
I worry on your behalf that in a dataset of any size what you ask for would mean many, many extra variables. I wonder on your behalf whether you need all of them any way for whatever you want to do with them.
That aside, this seems to be what you want. Naturally your column headers in your spreadsheet view aren't legal variable names. Disclosure: despite being the original author of levelsof
I wouldn't prefer its use here.
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
end
sort subc sku
* subc identifiers guaranteed to be integers 1 up
egen subc_id = group(subc), label
* observation numbers in a variable
gen long id = _n
* how many subc? loop over the range
su subc_id, meanonly
forval i = 1/`r(max)' {
* which subc is this one? look it up using -summarize-
* assuming that subc is numeric!
su subc if subc_id == `i', meanonly
local I = r(min)
* which observation numbers for this subc?
* given the prior sort, they are all contiguous
su id if subc_id == `i', meanonly
* for each observation in the subc, find out the sku and copy its price
* to all observations in that subc
forval j = `r(min)'/`r(max)' {
local J = sku[`j']
gen eta_`I'_`J' = cond(subc_id == `i', price[`j'], 0)
}
}
list subc eta*, sepby(subc)
+------------------------------------------------------------------+
| subc eta_1_1 eta_1_2 eta_2_3 eta_2_4 eta_2_5 eta_3_6 |
|------------------------------------------------------------------|
1. | 1 4.3 3 0 0 0 0 |
2. | 1 4.3 3 0 0 0 0 |
|------------------------------------------------------------------|
3. | 2 0 0 2.5 1 12 0 |
4. | 2 0 0 2.5 1 12 0 |
5. | 2 0 0 2.5 1 12 0 |
|------------------------------------------------------------------|
6. | 3 0 0 0 0 0 12 |
+------------------------------------------------------------------+
Notes:
N1. In your example, subc
is numbered 1, 2, etc. My extra variable subc_id
ensures that to be true even if in your real data the identifiers are not so clean.
N2. The expression
cond(subc_id == `i', price[`j'], 0)
could also be
(subc_id == `i') * price[`j']
N3. It seems possible that a different data structure would be much more efficient.
EDIT: Here is code and results for another data structure.
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
end
sort subc sku
egen subc_id = group(subc), label
bysort subc : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
forval j = 1/`jmax' {
gen eta`j' = .
gen which`j' = .
}
gen long id = _n
su subc_id, meanonly
quietly forval i = 1/`r(max)' {
su id if subc_id == `i', meanonly
local jmin = r(min)
local jmax = r(max)
local k = 1
forval j = `jmin'/`jmax' {
replace which`k' = sku[`j'] in `jmin'/`jmax'
replace eta`k' = price[`j'] in `jmin'/`jmax'
local ++k
}
}
list subc sku *1 *2 *3 , sepby(subc)
+------------------------------------------------------------+
| subc sku eta1 which1 eta2 which2 eta3 which3 |
|------------------------------------------------------------|
1. | 1 1 4.3 1 3 2 . . |
2. | 1 2 4.3 1 3 2 . . |
|------------------------------------------------------------|
3. | 2 3 2.5 3 1 4 12 5 |
4. | 2 4 2.5 3 1 4 12 5 |
5. | 2 5 2.5 3 1 4 12 5 |
|------------------------------------------------------------|
6. | 3 6 12 6 . . . . |
+------------------------------------------------------------+
回答2:
I am adding another answer that tackles combinations of subc
and week
. Previous discussion establishes that what you are trying to do would add an extra variable for every observation. This can't be a good idea! At best, you might just have many new variables, mostly zeros. At worst, you will run into Stata's limits.
Hence I won't support your endeavour to go further down the same road, but show how the second data structure I discuss in my previous answer can be produced. Indeed, you haven't indicated (a) why you want all these variables, which are just the existing data redistributed; (b) what your strategy is for dealing with them; (c) why rangestat
(SSC) or some other program could not remove the need to create them in the first place.
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
35 5.3 1 2 1 1
23 4 2 2 1 1
12 3.5 3 2 1 2
10 2 4 2 1 2
35 13 5 2 1 2
35 13 6 2 1 3
end
sort subc week sku
egen joint = group(subc week), label
bysort joint : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
forval j = 1/`jmax' {
gen eta`j' = .
gen which`j' = .
}
gen long id = _n
su joint, meanonly
quietly forval i = 1/`r(max)' {
su id if joint == `i', meanonly
local jmin = r(min)
local jmax = r(max)
local k = 1
forval j = `jmin'/`jmax' {
replace which`k' = sku[`j'] in `jmin'/`jmax'
replace eta`k' = price[`j'] in `jmin'/`jmax'
local ++k
}
}
list subc week sku *1 *2 *3 , sepby(subc week)
+-------------------------------------------------------------------+
| subc week sku eta1 which1 eta2 which2 eta3 which3 |
|-------------------------------------------------------------------|
1. | 1 1 1 4.3 1 3 2 . . |
2. | 1 1 2 4.3 1 3 2 . . |
|-------------------------------------------------------------------|
3. | 1 2 1 5.3 1 4 2 . . |
4. | 1 2 2 5.3 1 4 2 . . |
|-------------------------------------------------------------------|
5. | 2 1 3 2.5 3 1 4 12 5 |
6. | 2 1 4 2.5 3 1 4 12 5 |
7. | 2 1 5 2.5 3 1 4 12 5 |
|-------------------------------------------------------------------|
8. | 2 2 3 3.5 3 2 4 13 5 |
9. | 2 2 4 3.5 3 2 4 13 5 |
10. | 2 2 5 3.5 3 2 4 13 5 |
|-------------------------------------------------------------------|
11. | 3 1 6 12 6 . . . . |
|-------------------------------------------------------------------|
12. | 3 2 6 13 6 . . . . |
+-------------------------------------------------------------------+
回答3:
clear
input units price sku week store subc
35 4.3 1 1 1 1
23 3 2 1 1 1
12 2.5 3 1 1 2
10 1 4 1 1 2
35 12 5 1 1 2
35 12 6 1 1 3
35 5.3 1 2 1 1
23 4 2 2 1 1
12 3.5 3 2 1 2
10 2 4 2 1 2
35 13 5 2 1 2
35 13 6 2 1 3
end
egen joint = group(subc sku), label
bysort store week : gen freq = _N
su freq, meanonly
local jmax = r(max)
drop freq
tostring subc sku, replace
gen new = subc + "_"+sku
su joint, meanonly
forval j = 1/`r(max)'{
local J = new[`j']
gen eta`J' = .
}
sort subc week store sku
egen joint1 = group(subc week store), label
gen long id = _n
su joint1, meanonly
quietly forval i = 1/`r(max)' {
su id if joint1 == `i', meanonly
local jmin = r(min)
local jmax = r(max)
forval j = `jmin'/`jmax' {
local subc = subc[`j']
local sku = sku[`j']
replace eta`subc'_`sku' = price[`j'] in `jmin'/`jmax'
replace eta`subc'_`sku' = 0 in `j'/`j'
}
}
list subc sku store week eta*, sepby(subc)
+---------------------------------------------------------------------------------+
| store week subc sku eta1_1 eta1_2 eta2_3 eta2_4 eta2_5 eta3_6 |
|---------------------------------------------------------------------------------|
1. | 1 1 1 2 4.3 0 . . . . |
2. | 1 1 1 1 0 3 . . . . |
|---------------------------------------------------------------------------------|
3. | 1 1 2 4 . . 2.5 0 12 . |
4. | 1 1 2 3 . . 0 1 12 . |
5. | 1 1 2 5 . . 2.5 1 0 . |
|---------------------------------------------------------------------------------|
6. | 1 1 3 6 . . . . . 0 |
|---------------------------------------------------------------------------------|
7. | 1 2 1 2 5.3 0 . . . . |
8. | 1 2 1 1 0 4 . . . . |
|---------------------------------------------------------------------------------|
9. | 1 2 2 3 . . 0 2 13 . |
10. | 1 2 2 5 . . 3.5 2 0 . |
11. | 1 2 2 4 . . 3.5 0 13 . |
|---------------------------------------------------------------------------------|
12. | 1 2 3 6 . . . . . 0 |
+---------------------------------------------------------------------------------+
来源:https://stackoverflow.com/questions/48818009/populating-new-variable-using-vlookup-with-multiple-criteria-in-another-variable