Creating a Sankey Diagram using NetworkD3 package in R

后端 未结 2 1073
悲哀的现实
悲哀的现实 2020-12-31 23:31

Currently I am trying to create an interactive Sankey with the networkD3 Package following the instructions by Chris Grandrud (https://christophergandrud.github

2条回答
  •  长情又很酷
    2021-01-01 00:25

    you need two dataframes: one listing all nodes (containing the names) and one listing the links. The latter contains three columns, the source node, the target node and some value, indicating the strength or width of the link. In the links dataframe you refer to the nodes by the (zero-based) position in the nodes dataframe.

    Assuming you data looks like:

    df <- data.frame(Year1=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                     Year2=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                     Year3=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                     Year4=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                     stringsAsFactors = FALSE)
    

    For the diagram you need to differentiate not only between the hotels but between the hotel/year combination since each of them should be one node:

    df$Year1 <- paste0("Year1_", df$Year1)
    df$Year2 <- paste0("Year2_", df$Year2)
    df$Year3 <- paste0("Year3_", df$Year3)
    df$Year4 <- paste0("Year4_", df$Year4)
    

    the links are the "transitions" between the hotels from one year to the next:

    library(dplyr)
    trans1_2 <- df %>% group_by(Year1, Year2) %>% summarise(sum=n())
    trans2_3 <- df %>% group_by(Year2, Year3) %>% summarise(sum=n())
    trans3_4 <- df %>% group_by(Year3, Year4) %>% summarise(sum=n())
    
    colnames(trans1_2)[1:2] <- colnames(trans2_3)[1:2] <- colnames(trans3_4)[1:2] <- c("source","target")
    
    links <- rbind(as.data.frame(trans1_2), 
                   as.data.frame(trans2_3), 
                   as.data.frame(trans3_4))
    

    finally, the dataframes need to be referenced to each other:

    nodes <- data.frame(name=unique(c(links$source, links$target)))
    links$source <- match(links$source, nodes$name) - 1
    links$target <- match(links$target, nodes$name) - 1
    

    Then the diagram can be drawn:

    library(networkD3)
    sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
                  Target = "target", Value = "sum", NodeID = "name",
                  fontSize = 12, nodeWidth = 30)
    

    There might be more elegant solutions, but this could be a starting point for your problem. If you don't like the "Year..." in the nodes' names you con remove them after setting up the dataframes.

提交回复
热议问题