How to combine state-level shapefiles from the united states census bureau into a nationwide shape

牧云@^-^@ 提交于 2019-12-21 01:06:32

问题


The census bureau doesn't provide a nationwide shapefile of public use microdata areas (the smallest geography available on the American Community Survey). I tried combining them all with a few different methods, but even the one that de-dupes identifiers breaks once it hits California. Am I doing something silly or does this require a difficult workaround? Here's code to reproduce up to the point where things break.

library(taRifx.geo)
library(maptools)

td <- tempdir() ; tf <- tempfile()
setInternet2( TRUE )
download.file( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/" , tf )

al <- readLines( tf )
tl <- al[ grep( "geo/tiger/TIGER2014/PUMA/tl_2014_" , al ) ]
fp <- gsub( "(.*)geo/tiger/TIGER2014/PUMA/tl_2014_([0-9]*)_puma10\\.zip(.*)" , "\\2" , tl )

# get rid of alaska
fp <- fp[ fp != '02' ]

af <- paste0( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/tl_2014_" , fp , "_puma10.zip" )

d <- NULL
for ( i in af ){
    try( file.remove( z ) , silent = TRUE )
    download.file( i , tf , mode = 'wb' )
    z <- unzip( tf , exdir = td )
    b <- readShapePoly( z[ grep( 'shp$' , z ) ] )
    if ( is.null( d ) ) d <- b else d <- taRifx.geo:::rbind.SpatialPolygonsDataFrame( d , b , fix.duplicated.IDs = TRUE )
}

# Error in `row.names<-.data.frame`(`*tmp*`, value = c("d.0", "d.1", "d.2",  : 
  # duplicate 'row.names' are not allowed
# In addition: Warning message:
# non-unique values when setting 'row.names': ‘d.0’, ‘d.1’, ‘d.10’, ‘d.11’, ‘d.12’, ‘d.13’, ‘d.14’, ‘d.15’, ‘d.16’, ‘d.17’, ‘d.18’, ‘d.19’, ‘d.2’, ‘d.3’, ‘d.4’, ‘d.5’, ‘d.6’, ‘d.7’, ‘d.8’, ‘d.9’ 

回答1:


Here's another approach, which includes a short cut for obtaining the FTP directory listing. As @Pop mentioned, the key is to ensure that the IDs are all unique.

library(RCurl) 
library(rgdal)

# get the directory listing
u <- 'ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/'
f <- paste0(u, strsplit(getURL(u, ftp.use.epsv = FALSE, ftplistonly = TRUE), 
                        '\\s+')[[1]])

# download and extract to tempdir/shps
invisible(sapply(f, function(x) {
  path <- file.path(tempdir(), basename(x))
  download.file(x, destfile=path, mode = 'wb')
  unzip(path, exdir=file.path(tempdir(), 'shps'))
}))

# read in all shps, and prepend shapefile name to IDs
shps <- lapply(sub('\\.zip', '', basename(f)), function(x) {
  shp <- readOGR(file.path(tempdir(), 'shps'), x)
  shp <- spChFIDs(shp, paste0(x, '_', sapply(slot(shp, "polygons"), slot, "ID")))
  shp
})

# rbind to a single object
shp <- do.call(rbind, as.list(shps))

# plot (note: clipping to contiguous states for display purposes)
plot(shp, axes=T, xlim=c(-130, -60), ylim=c(20, 50), las=1)

# write out to wd/USA.shp
writeOGR(shp, '.', 'USA', 'ESRI Shapefile')




回答2:


Your problem as you should have guessed is due to the fact that there are duplicate polygon IDs in your object d.

Indeed, all the polygon IDs in your "shp" files are "0". Thus, you used fix.duplicated.IDs = TRUE to make them different.

This is weird because the taRifx.geo:::rbind.SpatialPolygonsDataFrame should have fixed it as you set fix.duplicated.IDs = TRUE. More accurately, the information is transmitted to sp::rbind.SpatialPolygons which calls the "internal" function sp:::makeUniqueIDs, which finally uses the function base::make.unique.

I did not want to see what went wrong in this chain. Alternatively, I advise you to set up yourself the IDs of your polygons, instead of using the fix.duplicated.IDs option.

To fix it by yourself, replace your for-loop by the following code:

d <- NULL
count <- 0
for ( i in af ){
    try( file.remove( z ) , silent = TRUE )
    download.file( i , tf , mode = 'wb' )
    z <- unzip( tf , exdir = td )
    b <- readShapePoly( z[ grep( 'shp$' , z ) ] )

    for (j in 1:length(b@polygons))
        b@polygons[[j]]@ID <- as.character(j + count)
    count <- count + length(b@polygons)

    if ( is.null( d ) ) 
       d <- b 
    else 
       d <- taRifx.geo:::rbind.SpatialPolygonsDataFrame( d , b )
}

The simple for-loop on j only changes the ID of each polygon in the object b before biding it to d.



来源:https://stackoverflow.com/questions/26357580/how-to-combine-state-level-shapefiles-from-the-united-states-census-bureau-into

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!