问题
I can't wrap my head around the following Stata programming problem:
I have a table listing all car purchases by customers and make:
Customer | Make | Price
-----------------------
c1 | m1 | 1
c1 | m1 | 2
c1 | m3 | 1
c2 | m2 | 2
c3 | . | .
I want to transform this into a table with one observation/row per customer, listing the maximum price paid for every make:
Customer | m1 | m2 | m3
-----------------------
c1 | 2 | 0 | 1
c2 | 0 | 1 | 0
c3 | 0 | 0 | 0
How do I achieve this? I know reshape wide
, but that doesn't work because of the doubled c1 | m1
row. Also, the missing values for c3
are causing troubles.
回答1:
Depending on what you want to do, I suggest approaching this a little differently. For example using -bysort- you can find the maximum price by customer for each make.
bysort Customer Make : egen maxPrice = max( Price )
Or, you can use collapse to find the max price by customer and make:
collapse (max) Price, by( Customer Make )
But, if you really want the table you posted using -reshape- you could run the following:
collapse (max) Price, by( Customer Make )
drop if Price == .
reshape wide Price, i( Customer ) j( Make ) string
renpfix Price
Note that reshape will fail if it encounters missing data in the Price column. I dropped these observations in the code above but you may choose to do something different like replace the missing data with zeros as you show in your posted target table.
来源:https://stackoverflow.com/questions/6151020/how-to-aggregate-relational-data-in-stata