Looking up data within a file versus merging

问题

I have a file that look at ratings that teacher X gives to teacher Y and the date it occurs

clear 
rating_id   RatingTeacher   RatedTeacher  Rating          Date     
  1              15             12          1          "1/1/2010"
  2              12             11          2          "1/2/2010"
  3              14             11          3          "1/2/2010"
  4              14             13          2          "1/5/2010"
  5              19             11          4          "1/6/2010"
  5              11             13          1          "1/7/2010"
 end

I want to look in the history to see how many times the RatingTeacher had been rated at the time they make the rating and the cumulative score. The result would look like this.

rating_id   RatingTeacher   RatedTeacher  Rating          Date      TimesRated    CumulativeRating  
  1              15             12          1          "1/1/2010"       0              0
  2              12             11          2          "1/2/2010"       1              1
  3              14             11          3          "1/2/2010"       0              0
  4              14             13          2          "1/5/2010"       0              0
  5              19             11          4          "1/6/2010"       0              0
  5              11             13          1          "1/7/2010"       3              9
 end

I have been merging the dataset with itself to get this to work, and it is fine. I was wondering if there was a more efficient way to do this within the file

回答1:

In your input data, I guess that the last rating_id should be 6 and that dates are MDY. Statalist members are asked to use dataex (SSC) to set up data examples. This isn't Statalist but there is no reason for lower standards to apply. See the Statalist FAQ

I rarely see even programmers be precise about what they mean by "efficient", whether it means fewer lines of code, less use of memory, more speed, something else or is just some all-purpose term of praise. This code loops over observations, which can certainly be slow for large datasets. More in this paper

We can't compare with your merge solution because you don't give the code.

clear 
input rating_id RatingTeacher RatedTeacher Rating str8 SDate 
1 15 12 1 "1/1/2010"
2 12 11 2 "1/2/2010"
3 14 11 3 "1/2/2010"
4 14 13 2 "1/5/2010"
5 19 11 4 "1/6/2010"
6 11 13 1 "1/7/2010"
end 
gen Date = daily(SDate, "MDY") 
sort Date 

gen Wanted = . 
quietly forval i = 1/`=_N' { 
    count if Date <  Date[`i'] & RatedT == RatingT[`i'] 
    replace Wanted = r(N) in `i' 
} 

list, sep(0)  

     +---------------------------------------------------------------------+
     | rating~d   Rating~r   RatedT~r   Rating      SDate    Date   Wanted |
     |---------------------------------------------------------------------|
  1. |        1         15         12        1   1/1/2010   18263        0 |
  2. |        2         12         11        2   1/2/2010   18264        1 |
  3. |        3         14         11        3   1/2/2010   18264        0 |
  4. |        4         14         13        2   1/5/2010   18267        0 |
  5. |        5         19         11        4   1/6/2010   18268        0 |
  6. |        6         11         13        1   1/7/2010   18269        3 |
     +---------------------------------------------------------------------+

回答2:

The building block is that the rater and ratee are a pair. You can use egen's group() to give a unique ID to each rater ratee pair.

egen pair = group(rater ratee)
bysort pair (date): timesRated = _n

来源：https://stackoverflow.com/questions/36212501/looking-up-data-within-a-file-versus-merging

标签

stata