Visualize in R flow from one set of objects to another

自古美人都是妖i 提交于 2019-12-09 13:46:27

问题


I am interested in how grants are reviewed at the NIH. The way the grant review process works is that Congress allocates funding to various institutes (e.g., the National Cancer Institute, or NCI), and individual grants are submitted to these institutes. These institutes are organized around various funding priorities (e.g., cancer, infectious diseases, etc).

However, when grants are reviewed, they are typically (but not always) sent to individual study sections, which are organized more around scientific disciplines. Thus, the "Tumor Progression" study section can find itself reviewing grants from both the National Cancer Institute and the National Heart, Lungs, and Blood institute (NHLBI) if a researcher submits a grant to NHLBI to study leukemia.

I have a data frame in R that looks something like this:

grant_id <- 1:100
funding_agency <- sample(rep(c("NIAID", "NIGMS", "NHLBI", "NCI", "NINDS"), 20))
study_section <- sample(rep(c("Tumor Cell Biology", "Tumor Progression", 
                              "Vector Biology", "Molecular Genetics", 
                              "Medical Imaging", "Macromolecular Structure",
                              "Infectious Diseases", "Drug Discovery", 
                              "Cognitive Neuroscience", "Aging and Geriatrics"), 
                            10)
                        )
total_cost <- rnorm(100, mean = 30000, sd = 10000)
d <- data.frame(grant_id, funding_agency, study_section, total_cost)

some(d)

   grant_id funding_agency          study_section total_cost
15       15          NINDS         Vector Biology   25242.19
19       19            NCI    Infectious Diseases   29075.21
50       50            NCI         Drug Discovery   25176.35
62       62            NCI      Tumor Progression   14264.34
64       64          NIAID     Tumor Cell Biology   30024.13

I would like to create two visualizations of these data, hopefully using R; one that shows how grants that are submitted to individual institutes are assigned to study sections, and a second that shows the dollar amount of the grants that are assigned by the institutes to study sections. What I ultimately want is a chart like you see in the following websites:

Migration flow

College major to job pipelines

Does anybody know of an R package and / or have some sample code to create a chart like you find on the websites above? Alternatively, is there a different visualization that I should consider that would accomplish the same goals?


回答1:


Here is how to do it with rCharts. You can view the final SankeyPlot here

d <- data.frame(
  id = grant_id, 
  source = funding_agency, 
  target = study_section, 
  value = total_cost
)
# devtools::install_github("rCharts", "ramnathv", ref = "dev")
require(rCharts)
sankeyPlot <- rCharts$new()
sankeyPlot$setLib('http://timelyportfolio.github.io/rCharts_d3_sankey')
sankeyPlot$set(
  data = d,
  nodeWidth = 15,
  nodePadding = 10,
  layout = 32,
  width = 750,
  height = 500,
  labelFormat = ".1%"
)
sankeyPlot

To save the chart, you can do

sankeyPlot$save('mysankey.html')




回答2:


Can't help much with the visualization piece, but you are looking for a 2-way table for the data.

Using package reshape2 and ignoring grant_id

d1 <- melt(d[,2:4])
d2 <- dcast(d1, study_section~funding_agency,sum)
> d2
              study_section      NCI     NHLBI     NIAID     NIGMS     NINDS
1      Aging and Geriatrics 28598.04  76524.55      0.00 109492.59 138330.12
2    Cognitive Neuroscience 76484.18  88217.42  78126.55  71546.62  73132.14
3            Drug Discovery 43667.30  39683.03  23797.24  46363.75 105655.61
4       Infectious Diseases 65375.44 136462.03  96413.08  34653.48  13835.22
5  Macromolecular Structure 84308.64  42290.61  39886.87  61645.00  67550.41
6           Medical Imaging 26264.32  86736.36 106356.13  41001.21  35549.83
7        Molecular Genetics 49473.72      0.00 110201.52  69468.03  86688.24
8        Tumor Cell Biology 99930.88  50862.39  95394.23  26269.98  46944.60
9         Tumor Progression 58719.89  52669.80  86874.89      0.00 119264.59
10           Vector Biology 64251.66  30880.81  66734.26 125524.72      0.00

This tells you which study_section received how much grant from which funding agency. Now how to display this is a different question. Maybe check out http://statmath.wu.ac.at/projects/vcd/



来源:https://stackoverflow.com/questions/19730604/visualize-in-r-flow-from-one-set-of-objects-to-another

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!