问题
I am trying to find the four biggest numbers of a variable in Stata, as I want to calculate the industry concentration of different groups based on sales. I have firms sales from multiple years and the firms belong to different groups based on industries and countries.
Thus, I would like to find:
industry concentration = (4 biggest sales-values of a year of one industry-&-country-group) / sum of all sales for one year of the industry-&-country-group)
I have about 10000
firms for about 10
years:
firms country year industry sales
a usa 1 1 300
a usa 2 1 4000
b ger 1 1 200
b ger 2 1 400
c usa 1 1 100
c usa 2 1 300
d usa 1 1 400
d usa 2 1 200
e usa 1 1 7000
e usa 2 1 900
f ger 1 2 100
f ger 2 2 700
h ger 1 2 700
h ger 2 2 600
I know how to find the sum of sales per industry-country-year-group:
bysort country industry year: egen sum_sales = sum(sales)
回答1:
The sum of the four biggest is
bysort country industry year (sales): generate four_biggest_sales = sales[_N] + ///
sales[_N-1] + sales[_N-2] + sales[_N-3]
provided that no values of sales
are missing. If there are only three values then you'd need
max(0, sales[_N-3])
with similar corrections for the cases of two values, one value or none.
This all follows from basic syntax for the by
prefix. See this article on Stata Journal for a tutorial.
If there are missings, then they can be segregated by
generate isnotmiss = !missing(sales)
bysort isnotmiss country industry year (sales): generate four_biggest_sales = sales[_N] + ///
sales[_N-1] + sales[_N-2] + sales[_N-3]
来源:https://stackoverflow.com/questions/17771069/calculate-industry-concentration-based-on-four-biggest-numbers