stata | 易学教程

Stata: Summary stats with table. Order by N

阅读更多关于 Stata: Summary stats with table. Order by N

问题 How can I order the following table in descending order of frequency? sysuse auto.dta, clear replace make = substr(make,1, strpos(make," ")-2) table make, c(N price mean price median price sd price min price max price) format(%9.2f) center The first observation should be buic or old with N=7. Is there a way to order by frequency? The code above also gives the error that there's too many stats() . Is there an alternative procedure that allows more columns? 回答1: (In what follows, I followed

Stata: Fail at setting global pathname

阅读更多关于 Stata: Fail at setting global pathname

问题 I have a profile.do , where it states global prog "C:\Users\foobar\Google Drive\Cloud\PhD\Projects\Labor Supply\LIAB_QM2_9310_v1_test_dta\prog" Then I have a different stata different.do file, where this variable is supposed to be set adopath ++ $prog However, this turns to not work out. So I tried to discover the root of the error: . display $prog C:\Users\foobar\Google invalid name Using ' instead of " didn't help: . global prog 'C:\Users\sdaro\Google Drive\Cloud\PhD\Projects\Labor Supply

Stata: saving regressions coefficients and standard errors in .dta file when there are factor variables

阅读更多关于 Stata: saving regressions coefficients and standard errors in .dta file when there are factor variables

问题 I would like to run several regressions and store their results in a DTA file that I could later use for analysis. My constraints are: I cannot install modules (I am writing code for other people and not sure what modules they have installed) Some of the regressors are factor variables. Each regression differ only by the dependent variable, so I would like to store that in the final dataset to keep track of what regression the coefficients/variances correspond to. I am seriously losing sanity

R - Keep first observation per group identified by multiple variables (Stata equivalent “bys var1 var2 : keep if _n == 1”)

阅读更多关于 R - Keep first observation per group identified by multiple variables (Stata equivalent “bys var1 var2 : keep if _n == 1”)

问题 So I currently face a problem in R that I exactly know how to deal with in Stata, but have wasted over two hours to accomplish in R. Using the data.frame below, the result I want is to obtain exactly the first observation per group, while groups are formed by multiple variables and have to be sorted by another variable, i.e. the data.frame mydata obtained by: id <- c(1,1,1,1,2,2,3,3,4,4,4) day <- c(1,1,2,3,1,2,2,3,1,2,3) value <- c(12,10,15,20,40,30,22,24,11,11,12) mydata <- data.frame(id,

From edge or arc list to clusters in Stata

阅读更多关于 From edge or arc list to clusters in Stata

问题 I have a Stata dataset that represents connections between users that looks like this: src_user linked_user 1 2 2 3 3 5 1 4 6 7 I would like to get something like this: user cluster 1 1 2 1 3 1 4 1 5 1 6 2 7 2 where isid user evaluates to TRUE and I have grouped all users into disjoint clusters. I have tried thinking of this as a reshape problem, but without much success. None of the user-written SNA commands seem to accomplish this as far as I can tell. What is the most efficient way of

Extracting random integers from a given variable

阅读更多关于 Extracting random integers from a given variable

问题 My task is picking 3 years randomly within a time interval 500 times for simulation purpose. More specifically，I want to select 3 random years from 2007 to 2016 ( 10 years), for example 2008 , 2012 and 2014 . So it's more or less equivalent to extracting random integers from a variable. My solution is the following: * The following (empty) dataset will be used to append the results of the Monte Carlo simulations use "recession_parms.dta", clear save "ind_simulations.dta", replace forvalues i

Stata: compare two datasets and drop different variables

阅读更多关于 Stata: compare two datasets and drop different variables

问题 I have two large datasets (more than 1000 variables in each), one of which has all the variables of the second, plus additional variables. I would like to get a list of all these additional variables, and then drop them and append one dataset to another. I have tried the command dta_equal , but got the same problem found here: http://www.stata.com/statalist/archive/2011-08/msg00308.html I guess append, keep() cannot realize what I want to do directly, i.e., cannot append dataset while drop

How do I convert (daily) date to month date?

阅读更多关于 How do I convert (daily) date to month date?

问题 In Stata, how do I convert date in the form of: 09mar2005 00:00:00 to a month-year variable? If it matters, the date format is %tc . What I have in mind is to plot monthly averages (instead of the daily average I have) of variables across time. 回答1: To get where you are now, you or somebody else may have done something like this: clear set obs 1 gen earlier = "09mar2005 00:00:00" gen double nowhave = clock(earlier, "DMY hms") format nowhave %tc list +-----------------------------------------+

Create comparison-of-means table with multiple variables by multiple groups comparing to total mean

阅读更多关于 Create comparison-of-means table with multiple variables by multiple groups comparing to total mean

问题 I'm looking for a way to create a comparison-of-means (t-test) table from the output of a tabstat command. Basically, I want to know if the mean of each group is statistically significantly different from the mean for the variable overall. I have 75 variables across 15 groups for a total of 1125 t-tests, so doing them one at a time is out of the question. I could always write a loop for the tests, but I was wondering if there was a command similar to tabstat that would make the table for me.

Stata. How to match values in 1:m relationship?

阅读更多关于 Stata. How to match values in 1:m relationship?

问题 I have two data sets. First one is: countyGroup income other_data_ 1 20990 … 2 25622 … 3 24289 … 4 27391 … 5 23326 … 6 19261 … 7 15197 … 8 11132 … The second one is: countyGroup state county other_data 1 IL 123 … 1 IL 123 … 2 MI 365 … 1 IL 123 … 3 AK 65 … 4 IL 546 … 5 MI 689 … 6 AK 32 … Variable countyGroup uniquely identifies both state and county . The second data set contains countyGroup , state and county . The first data set contains only countyGroup . I need to generate two variables (