ggplot/mapping US counties — problems with visualization shapes in R

前端 未结 6 2157
迷失自我
迷失自我 2020-12-13 16:14

So I have a data frame in R called obesity_map which basically gives me the state, county, and obesity rate per county. It looks more or less like this:

obesi         


        
6条回答
  •  清歌不尽
    2020-12-13 17:02

    So this is a similar example but attempts to accommodate the format of your obesity_map dataset. It also uses a data table join which is much faster than merge(...), especially with large datasets like yours.

    library(ggplot2)
    # this creates an example formatted as your obesity.map - you have this already...
    set.seed(1)    # for reproducible example
    map.county <- map_data('county')
    counties   <- unique(map.county[,5:6])
    obesity_map <- data.frame(state_names=counties$region, 
                              county_names=counties$subregion, 
                              obesity= runif(nrow(counties), min=0, max=100))
    
    # you start here...
    library(data.table)   # use data table merge - it's *much* faster
    map.county <- data.table(map_data('county'))
    setkey(map.county,region,subregion)
    obesity_map <- data.table(obesity_map)
    setkey(obesity_map,state_names,county_names)
    map.df      <- map.county[obesity_map]
    
    ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) + 
      geom_polygon()+coord_map()
    

    Also, if your dataset has the FIPS codes, which it seems to, I'd strongly recommend you use the US Census Bureau's TIGER/Line county shapefile (which also has these codes), and merge on that. This is much more reliable. For example, in your extract of the obesity_map data frame, the states and counties are capitalized, whereas in the built-in counties dataset in R, they are not, so you would have to deal with that. Also, the TIGER file is up to date, whereas the internal dataset is not.

    So this is kind of an interesting question. Turns out the actual obesity data is on the USDA website and can be downloaded here as an MSExcel file. There's also a shapfile of US counties on the Census Bureau website, here. Both the Excel file and the shapefile have FIPS information. In R this can be put together relatively simply:

    library(XLConnect)    # for loadWorkbook(...) and readWorksheet(...)
    library(rgdal)        # for readOGR(...)
    library(RcolorBrewer) # for brewer.pal(...)
    library(data.table)
    
    setwd(" < directory with all your files > ")
    wb <- loadWorkbook("DataDownload.xls")   # from the USDA website
    df <- readWorksheet(wb,"HEALTH")         # this sheet has the obesity data
    
    US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m")
    #leave out AK, HI, and PR (state FIPS: 02, 15, and 72)
    US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),]  
    county.data <- US.counties@data
    county.data <- cbind(id=rownames(county.data),county.data)
    county.data <- data.table(county.data)
    county.data[,FIPS:=paste0(STATE,COUNTY)] # this is the state + county FIPS code
    setkey(county.data,FIPS)      
    obesity.data <- data.table(df)
    setkey(obesity.data,FIPS)
    county.data[obesity.data,obesity:=PCT_OBESE_ADULTS10]
    
    map.df <- data.table(fortify(US.counties))
    setkey(map.df,id)
    setkey(county.data,id)
    map.df[county.data,obesity:=obesity]
    
    ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) +
      scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+
      geom_polygon()+coord_map()+
      labs(title="2010 Adult Obesity by Country, percent",x="",y="")+
      theme_bw()
    

    to produce this:

提交回复
热议问题