keep region names when tidying a map using broom package

橙三吉。 提交于 2019-12-06 00:06:06

You can use the join function from package plyr. Here is a general solution (it looks long but it is actually very easy):

  1. Load shapefile: Let us say you have a shapefile my_shapefile.shp in your working directory. Let's load it:

    shape <- readOGR(dsn = "/my_working_directory", layer = "my_shapefile")
    

    Notice that inside this shapefile there is a dataframe, which can be accessed with shape@data. For example, this dataframe could look like this:

    > head(shape@data)
           code                   region     label
    0 E12000006          East of England E12000006
    1 E12000007                   London E12000007
    2 E12000002               North West E12000002
    3 E12000001               North East E12000001
    4 E12000004            East Midlands E12000004
    5 E12000003 Yorkshire and The Humber E12000003
    
  2. Create new dataframe from shapefile: Use the broom package to tide the shapefile dataframe:

    new_df <- tidy(shape)
    

This results in something like this:

> head(new_df)
      long      lat order  hole piece group id           
1 547491.0 193549.0     1 FALSE     1   0.1  0 
2 547472.1 193465.5     2 FALSE     1   0.1  0 
3 547458.6 193458.2     3 FALSE     1   0.1  0 
4 547455.6 193456.7     4 FALSE     1   0.1  0 
5 547451.2 193454.3     5 FALSE     1   0.1  0 
6 547447.5 193451.4     6 FALSE     1   0.1  0

Unfortunately, tidy() lost the variable names ("region", in this example). Instead, we got a new variable "id", starting at 0. Fortunately, the ordering of "id" is the same as that stored in shape@data$region. Let us use this to recover the names.

  1. Create auxiliary dataframe with row names: Let us create a new dataframe with the row names. Additionally, we will add an "id" variable, identical to the one tidy() created:

    # Recover row name 
    temp_df <- data.frame(shape@data$region)
    names(temp_df) <- c("region")
    # Create and append "id"
    temp_df$id <- seq(0,nrow(temp_df)-1)
    
  2. Merge row names with new dataframe using "id": Finally, let us put the names back into the new dataframe:

    new_df <- join(new_df, temp_df, by="id")
    

That's it! You can even add more variables to the new dataframe, by using the join command and the "id" index. The final result would be something like:

> head(new_df)
      long      lat order  hole piece group id            name    var1    var2 
1 547491.0 193549.0     1 FALSE     1   0.1  0 East of England   0.525   0.333   
2 547472.1 193465.5     2 FALSE     1   0.1  0 East of England   0.525   0.333   
3 547458.6 193458.2     3 FALSE     1   0.1  0 East of England   0.525   0.333   
4 547455.6 193456.7     4 FALSE     1   0.1  0 East of England   0.525   0.333   
5 547451.2 193454.3     5 FALSE     1   0.1  0 East of England   0.525   0.333   
6 547447.5 193451.4     6 FALSE     1   0.1  0 East of England   0.525   0.333   

alistaire's comment pushed me to keep pushing on the region= parameter. I tried many iterations and I found some ideas in this thread https://github.com/tidyverse/ggplot2/issues/1447.

Here is the code that grabs the district names:

# load the magrittr library to get the pipe
library(magrittr)
# load the maptools library to get the rgeos object
library(maptools)

arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/") %>% 
  # simplify
  rmapshaper::ms_simplify(keep = 0.01) %>% 
  # tidy to a dataframe
  broom::tidy(region="NAME_1")

# plot the map
library(ggplot2)
ggplot(data=arg_map_1) +
  geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
           color="#000000", size=0.25)

First of all, notice that the maptools library must be loaded in order for the tidy operation to work correctly. Also, I want to highlight that the variable to extract the region information from must be enclosed in quotes. I had been assuming incorrectly that broom would recognize the variable name in the same way that other tidyverse packages such as dplyr recognize column names unquoted or surrounded by backticks.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!