stata | 易学教程

Keeping all the records for specific IDs

阅读更多关于 Keeping all the records for specific IDs

问题 I have the following dataset: clear input id code cost 1 15342 18 2 15366 12 1 16786 32 2 15342 12 3 12345 45 4 23453 345 1 34234 23 2 22223 12 4 22342 64 3 23452 23 1 23432 22 end I want to get the output below: id code cost 1 15342 18 2 15366 12 1 16786 32 2 15342 12 1 34234 23 2 22223 12 1 23432 22 I tried to use this command but it did not work: keep if id = (1|2) How can I keep all the records for specific IDs? 回答1: The following works for me: keep if id == 1 | id == 2 Alternatively, you

Shuffle One Variable Within Group

阅读更多关于 Shuffle One Variable Within Group

问题 This question is an extension of the excellent answer provided by Robert Picard here: How to Randomly Assign to Groups of Different Sizes We have this dataset, which is the same as in the previous question, but adds the year variable: sysuse census, clear keep state region pop order state pop region decode region, gen(reg) replace reg="NCntrl" if reg=="N Cntrl" drop region gen year=20 replace year=30 if _n>15 replace year=40 if _n>35 If I just wanted to re-randomly assign reg 's across all

coefplot: Putting names of regressions with vertical option

阅读更多关于 coefplot: Putting names of regressions with vertical option

问题 Currently, I have the following code: sysuse auto, clear estimates clear gen year=. replace year=1988 if foreign==0 replace year=1989 if foreign==1 regress price mpg trunk length turn if year==1988 estimates store Year1988 regress price mpg trunk length turn if year==1989 estimates store Year1989 coefplot Year1988 Year1989, vertical keep(trunk) xline(0) xlabel("") This generates: However, I want to put custom names for each stored regression set of results like: How can I do this? I tried

Create new string variable with partial matching of another

阅读更多关于 Create new string variable with partial matching of another

问题 I am using Stata 15 and I would like to create a new string variable based on the contents of another. Consider the following toy variable: clear input str18 string "a b c" "d e f" "g h i" end I know I can use the regexm() function to extract all occurrences of a , b , d and g : generate new = regexm(string, "a|c|d|g") list |string new | |--------------| | a b c 1 | | d e f 1 | | g h i 1 | However, how can I get the following? |string new | |----------------| | a b c a c | | d e f d | | g h i

Identify first event or last non-event

阅读更多关于 Identify first event or last non-event

问题 I have the following data in Stata: clear * Input data input float id str7 event time id event time 1 "." 10 1 "." 20 1 "1" 30 1 "0" 40 1 "." 50 2 "0" 10 2 "0" 20 2 "0" 30 2 "0" 40 2 "0" 50 3 "1" 10 3 "1" 20 3 "0" 30 3 "." 40 3 "." 50 4 "." 10 4 "." 20 4 "." 30 4 "." 40 4 "." 50 5 "1" 10 5 "1" 20 5 "1" 30 5 "1" 40 5 "1" 50 end Below is data I hope to get to: * Input data input float id str7 event time id1 event1 time1 1 1 30 2 0 50 3 1 10 4 . 50 5 1 10 end My aim is to take the first row for

Replacing observations with a previous set observation

阅读更多关于 Replacing observations with a previous set observation

问题 I have three columns. One identifies the observations by F. The other column orders each observation within the same F, called T. The third column is a numerical value, called Q. I'd like all my values for Q greater than a certain value of T to be replaced by the values at a fixed T, within the same F. For example, I'd like all values of Q within the same F that have T > 6 to be equal to whatever value Q has for that F has for T = 6. If an F has a Q value of 40 at T=6 and a Q value of 50 at T

cycling Ranksum on Stata

阅读更多关于 cycling Ranksum on Stata

问题 I have some data with two different groupd of patients automatically exported from a diagnostic tool. Variables are automatically nominated by the diagnostic tool (e.g. L1DensityWholeImage, L1WholeImageSHemi, L1WholeImageIHemi , L1WholeETDRS ,[...], DeepL2StartLayer, L2Startoffsetum, L2EndLayer, [...], Perimeter, AcircularityIndex ) I have to perform a Rank-sum test (or Mann-Whitney U test) with all the variables (> of 80) by group. Normally, I should write each single analysis like that:

How to Randomly Assign to Groups of Different Sizes

阅读更多关于 How to Randomly Assign to Groups of Different Sizes

问题 Say I have a dataset and I want to assign observations to different groups, the size of groups determined by the data. For example, suppose that this is the data: sysuse census, clear keep state region pop order state pop region decode region, gen(reg) replace reg="NCntrl" if reg=="N Cntrl" drop region *Create global with regions global region NE NCntrl South West *Count the number in each region bys reg (pop): gen reg_N=_N tab reg There are four reg groups, all of different sizes. Now, I

Add a column of differences to tables of summary statistics in Stata

阅读更多关于 Add a column of differences to tables of summary statistics in Stata

问题 If I make a two way summary statistics table in Stata using table , can I add another column that is the difference of two other columns? Say that I have three variables ( a, b, c ). I generate quintiles on a and b then generate a two-way table of means of c in each quintile-quintile intersection. I would like to generate a sixth column that is the difference of mean c between the top and bottom quintiles of b for each quintile of a . I can generate the table of mean c for each quintile

In Stata, how can I combine box plots of different widths?

阅读更多关于 In Stata, how can I combine box plots of different widths?

问题 I'm trying to combine several box plots across categories of different size. Here is an example illustrating problem: sysuse auto graph box mpg, by(rep78, rows(1)) name(g1, replace ) graph box mpg, by(foreign, rows(1)) name(g2, replace ) graph combine g1 g2 , ycom r(2) This gives me the following results. All works according to the manual so for but I have two problems with this output. Firstly - aesthetics. Personally, I think plot with the same width across rows would look better. Secondly,