ggplot/mapping US counties — problems with visualization shapes in R

旧时模样 提交于 2019-11-28 18:28:57

Maybe a little late for another answer, but still worthwhile to share I think.

The reading and preprocessing of the data is similar to jlhoward's answer, with some differences:

library(tmap)      # package for plotting
library(readxl)    # for reading Excel
library(maptools)  # for unionSpatialPolygons

# download data
download.file("http://www.ers.usda.gov/datafiles/Food_Environment_Atlas/Data_Access_and_Documentation_Downloads/Current_Version/DataDownload.xls", destfile = "DataDownload.xls", mode="wb")
df <- read_excel("DataDownload.xls", sheet = "HEALTH")

# download shape (a little less detail than in the other scripts)
f <- tempfile()
download.file("http://www2.census.gov/geo/tiger/GENZ2010/gz_2010_us_050_00_20m.zip", destfile = f)
unzip(f, exdir = ".")
US <- read_shape("gz_2010_us_050_00_20m.shp")

# leave out AK, HI, and PR (state FIPS: 02, 15, and 72)
US <- US[!(US$STATE %in% c("02","15","72")),]  

# append data to shape
US$FIPS <- paste0(US$STATE, US$COUNTY)
US <- append_data(US, df, key.shp = "FIPS", key.data = "FIPS")

When the correct data is attached to the shape object, a choropleth can be drawn with one line of code:

qtm(US, fill = "PCT_OBESE_ADULTS10")

This could be enhanced by adding state borders, a better projection, and a title:

# create shape object with state polygons
US_states <- unionSpatialPolygons(US, IDs=US$STATE)

tm_shape(US, projection="+init=epsg:2163") +
  tm_polygons("PCT_OBESE_ADULTS10", border.col = "grey30", title="") +
tm_shape(US_states) +
  tm_borders(lwd=2, col = "black", alpha = .5) +
tm_layout(title="2010 Adult Obesity by County, percent", 
          title.position = c("center", "top"),
          legend.text.size=1)

So this is a similar example but attempts to accommodate the format of your obesity_map dataset. It also uses a data table join which is much faster than merge(...), especially with large datasets like yours.

library(ggplot2)
# this creates an example formatted as your obesity.map - you have this already...
set.seed(1)    # for reproducible example
map.county <- map_data('county')
counties   <- unique(map.county[,5:6])
obesity_map <- data.frame(state_names=counties$region, 
                          county_names=counties$subregion, 
                          obesity= runif(nrow(counties), min=0, max=100))

# you start here...
library(data.table)   # use data table merge - it's *much* faster
map.county <- data.table(map_data('county'))
setkey(map.county,region,subregion)
obesity_map <- data.table(obesity_map)
setkey(obesity_map,state_names,county_names)
map.df      <- map.county[obesity_map]

ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) + 
  geom_polygon()+coord_map()

Also, if your dataset has the FIPS codes, which it seems to, I'd strongly recommend you use the US Census Bureau's TIGER/Line county shapefile (which also has these codes), and merge on that. This is much more reliable. For example, in your extract of the obesity_map data frame, the states and counties are capitalized, whereas in the built-in counties dataset in R, they are not, so you would have to deal with that. Also, the TIGER file is up to date, whereas the internal dataset is not.

So this is kind of an interesting question. Turns out the actual obesity data is on the USDA website and can be downloaded here as an MSExcel file. There's also a shapfile of US counties on the Census Bureau website, here. Both the Excel file and the shapefile have FIPS information. In R this can be put together relatively simply:

library(XLConnect)    # for loadWorkbook(...) and readWorksheet(...)
library(rgdal)        # for readOGR(...)
library(RcolorBrewer) # for brewer.pal(...)
library(data.table)

setwd(" < directory with all your files > ")
wb <- loadWorkbook("DataDownload.xls")   # from the USDA website
df <- readWorksheet(wb,"HEALTH")         # this sheet has the obesity data

US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m")
#leave out AK, HI, and PR (state FIPS: 02, 15, and 72)
US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),]  
county.data <- US.counties@data
county.data <- cbind(id=rownames(county.data),county.data)
county.data <- data.table(county.data)
county.data[,FIPS:=paste0(STATE,COUNTY)] # this is the state + county FIPS code
setkey(county.data,FIPS)      
obesity.data <- data.table(df)
setkey(obesity.data,FIPS)
county.data[obesity.data,obesity:=PCT_OBESE_ADULTS10]

map.df <- data.table(fortify(US.counties))
setkey(map.df,id)
setkey(county.data,id)
map.df[county.data,obesity:=obesity]

ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) +
  scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+
  geom_polygon()+coord_map()+
  labs(title="2010 Adult Obesity by Country, percent",x="",y="")+
  theme_bw()

to produce this:

This is something I can get working managing the mapping variable. Renaming it to 'region'.

library(ggplot2)
library(maps)
m.usa <- map_data("county")
m.usa$id <- m.usa$subregion
m.usa <- m.usa[ ,-5]
names(m.usa)[5] <- 'region'


df <- data.frame(region = unique(m.usa$region),
                 obesity = rnorm(length(unique(m.usa$region)), 50, 10),
                 stringsAsFactors = F)

head(df)
region  obesity
1 autauga 44.54833
2 baldwin 68.61470
3 barbour 52.19718
4    bibb 50.88948
5  blount 42.73134
6 bullock 59.93515

ggplot(df, aes(map_id = region)) +
  geom_map(aes(fill = obesity), map = m.usa) + 
  expand_limits(x = m.usa$long, y = m.usa$lat) +
  coord_map()

I think all you needed to do was reorder the map.county variable like you had for the map.data variable previously.

....
map.county <- merge(county.obesity, map.county, all=TRUE)

## reorder the map before plotting
map.county <- map.county[order(map.data$county),] 

## plot
ggplot(map.county, aes(x = long, y = lat, group=group, fill=as.factor(value))) + geom_polygon(colour = "white", size = 0.1)

Building on @jlhoward's answer: the code with data.table fails for me in a mysterious way:

 Error in `:=`(FIPS, paste0(STATE, COUNTY)) : 
  Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":="). 

This error happened to me several times, but only when the code was inside a function, even just a minimal wrapper. It worked fine in a script. Although now I cannot reproduce the error, I adapted his/her code with merge() instead of data.table for completeness:

library(rgdal)        # for readOGR(...)
library(ggplot2)      # for fortify() and plot()
library(RColorBrewer) # for brewer.pal(...)

US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m")
#leave out AK, HI, and PR (state FIPS: 02, 15, and 72)
US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),]
county.data <- US.counties@data

county.data <- cbind(id=rownames(county.data),county.data)
county.data$FIPS <- paste0(county.data$STATE, county.data$COUNTY) # this is the state + county FIPS code

df <- data.frame(FIPS=county.data$FIPS,
                 PCT_OBESE_ADULTS10= runif(nrow(county.data), min=0, max=100))

# Merge county.data to obesity
county.data <- merge(county.data,
                     df,
                     by.x = "FIPS",
                     by.y = "FIPS")

map.df <- fortify(US.counties)

# Merge the map to county.data
map.df <- merge(map.df, county.data, by.x = "id", by.y = "id")

ggplot(map.df, aes(x=long, y=lat, group=group, fill=PCT_OBESE_ADULTS10)) +
  scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+
  geom_polygon()+coord_map()+
  labs(title="2010 Adult Obesity by Country, percent",x="",y="")+
  theme_bw()

I'm a little new at using TMAP and Spatial data, but figured I would post as a follow up to Martijn Tennekes. Using his advice I ran into an error in the second map (with the state borders). When running this line of code:

US_state <- unionSpatialPolygons(US,US$STATE)

I kept getting this error: "Error in unionSpatialPolygons(US, US$STATE) : not a SpatialPolygons object"

In order to rectify I had to use a different variable and run it as a Spatial Polygon Data Frame:

US <- read_shape("gz_2010_us_050_00_20m.shp")
US2<-readShapeSpatial("gz_2010_us_050_00_20m.shp")

US <- US[!(US$STATE %in% c("02","15","72")),]  

US$FIPS <- paste0(US$STATE, US$COUNTY)
US <- append_data(US, med_inc_df, key.shp = "FIPS", key.data = "GEOID")

#the difference is here:
US_states <- unionSpatialPolygons(US2, US2$STATE)

tm_shape(US, projection="+init=epsg:2163") +
  tm_polygons("estimate", border.col = "grey30", title="") +
  tm_shape(US_states) +
  tm_borders(lwd=2, col = "black", alpha = .5) +
  tm_layout(title="2016 Median Income by County", 
            title.position = c("center", "top"),
            legend.text.size=1)

My Map

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!