How to Aggregate Relational Data in Stata?

问题

I can't wrap my head around the following Stata programming problem:

I have a table listing all car purchases by customers and make:

Customer | Make | Price
-----------------------
      c1 |   m1 |     1
      c1 |   m1 |     2
      c1 |   m3 |     1
      c2 |   m2 |     2
      c3 |    . |     .

I want to transform this into a table with one observation/row per customer, listing the maximum price paid for every make:

Customer | m1 | m2 | m3
-----------------------
      c1 |  2 |  0 |  1
      c2 |  0 |  1 |  0
      c3 |  0 |  0 |  0

How do I achieve this? I know reshape wide, but that doesn't work because of the doubled c1 | m1 row. Also, the missing values for c3 are causing troubles.

回答1:

Depending on what you want to do, I suggest approaching this a little differently. For example using -bysort- you can find the maximum price by customer for each make.

bysort Customer Make : egen maxPrice = max( Price )

Or, you can use collapse to find the max price by customer and make:

collapse (max) Price, by( Customer Make )

But, if you really want the table you posted using -reshape- you could run the following:

collapse (max) Price, by( Customer Make )
drop if Price == .
reshape wide Price, i( Customer ) j( Make ) string
renpfix Price

Note that reshape will fail if it encounters missing data in the Price column. I dropped these observations in the code above but you may choose to do something different like replace the missing data with zeros as you show in your posted target table.

来源：https://stackoverflow.com/questions/6151020/how-to-aggregate-relational-data-in-stata

标签

aggregate

stata

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!