How can I import specific files?

允我心安 提交于 2019-12-25 01:17:20

问题


I am trying to import hundreds of U.S. county xls files together to form a complete dataset in Stata. The problem is that for every county, I have several files for different years, so that my list of file names looks like this:

county1-year1970.xls
county1-year1975.xls
county2-year1960.xls
county2-year1990.xls

For each county, I only want the file from the most recent year (which varies across counties).

So far, I have written code to loop through each possible file name, and if the file exists, to store the year in a local macro maxyear:

local years = 0
forvalues i = 1/500 {
    forvalues yr = 1900/2018 {
        capture confirm file county`i'-year`yr'.xls
        if _rc == 0 {
            local years `years' `yr'
        }
    }
    /* [code to extract the max value in `years'] */
    import excel county`i'-year`maxyear'.xls, clear
}

The loop seems to work, but it is still missing code that will extract the maximum value from the local list `years'. I want to use that maximum value to import the Excel sheet.

How can I identify the maximum value in a local macro or is there a simpler way to get what I want?


回答1:


As you are looping over years from first possible to last possible, all you need is to keep track of the last valid year:

forval i = 1/500 {
    local maxyear  
    forval yr = 1900/2018 {
        capture confirm file county`i'-year`yr'.xls
        if _rc == 0 local maxyear `yr'
    }

    if "`maxyear'" != "" {    
        import excel county`i'-year`maxyear'.xls, clear
    }
}

Otherwise put, keeping a record of all the years that were valid, and then finding the maximum over those, is more work than you need to do. (But notice that as you loop over increasing years, the maximum would just be the last item in your list.)

This answer is close to the question, but @Pearly Spencer's answer is a neater solution in this case.




回答2:


The following works for me and is more efficient:

forvalues i = 1 / 2 {
    local files `: dir . files "county`i'*"'
    display "`: word `: word count `files'' of `files''"
}

county1-year1975.xls
county2-year1990.xls

I use the display command here for illustration but you can also use import instead.

The idea here is that if you know the number of files beginning with the county prefix (county1, county2 etc.), you can get the files names for each prefix in a local macro using the macro extended function dir. Then you simply count the number of words there and get the last one.

Note that in this case the local macro will already be sorted alphabetically. However, more generally you can sort the items in a macro with the macro extended function list sort.

For example:

local files : list sort files

The following uses mata to circumvent the maximum character limitation in Stata's local macros:

forvalues i = 1 / 2 {
    mata: fl = sort(dir(".", "files", "county`i'*"), 1); st_local("file", fl[rows(fl)])
    display "`file'"
}

This approach will be useful if you have a large number of files, the names of which cannot all fit in a local macro.




回答3:


May I borrow Nick's code?

forval i = 1/500 {
    foreach  yr of numlist 2018(-1)1900 {
        capture confirm file county`i'-year`yr'.xls
        if _rc == 0 {
             import excel county`i'-year`yr'.xls, clear
             continue, break
        }
    }
}

Please let me know if this does not work as I can't test it on my side. However, my logic is to start with the largest number in yr, find the first one for a county, then break the loop, move to the next county.



来源:https://stackoverflow.com/questions/58721698/how-can-i-import-specific-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!