问题
I am interested in how grants are reviewed at the NIH. The way the grant review process works is that Congress allocates funding to various institutes (e.g., the National Cancer Institute, or NCI), and individual grants are submitted to these institutes. These institutes are organized around various funding priorities (e.g., cancer, infectious diseases, etc).
However, when grants are reviewed, they are typically (but not always) sent to individual study sections, which are organized more around scientific disciplines. Thus, the "Tumor Progression" study section can find itself reviewing grants from both the National Cancer Institute and the National Heart, Lungs, and Blood institute (NHLBI) if a researcher submits a grant to NHLBI to study leukemia.
I have a data frame in R that looks something like this:
grant_id <- 1:100
funding_agency <- sample(rep(c("NIAID", "NIGMS", "NHLBI", "NCI", "NINDS"), 20))
study_section <- sample(rep(c("Tumor Cell Biology", "Tumor Progression",
"Vector Biology", "Molecular Genetics",
"Medical Imaging", "Macromolecular Structure",
"Infectious Diseases", "Drug Discovery",
"Cognitive Neuroscience", "Aging and Geriatrics"),
10)
)
total_cost <- rnorm(100, mean = 30000, sd = 10000)
d <- data.frame(grant_id, funding_agency, study_section, total_cost)
some(d)
grant_id funding_agency study_section total_cost
15 15 NINDS Vector Biology 25242.19
19 19 NCI Infectious Diseases 29075.21
50 50 NCI Drug Discovery 25176.35
62 62 NCI Tumor Progression 14264.34
64 64 NIAID Tumor Cell Biology 30024.13
I would like to create two visualizations of these data, hopefully using R; one that shows how grants that are submitted to individual institutes are assigned to study sections, and a second that shows the dollar amount of the grants that are assigned by the institutes to study sections. What I ultimately want is a chart like you see in the following websites:
Migration flow
College major to job pipelines
Does anybody know of an R package and / or have some sample code to create a chart like you find on the websites above? Alternatively, is there a different visualization that I should consider that would accomplish the same goals?
回答1:
Here is how to do it with rCharts
. You can view the final SankeyPlot here
d <- data.frame(
id = grant_id,
source = funding_agency,
target = study_section,
value = total_cost
)
# devtools::install_github("rCharts", "ramnathv", ref = "dev")
require(rCharts)
sankeyPlot <- rCharts$new()
sankeyPlot$setLib('http://timelyportfolio.github.io/rCharts_d3_sankey')
sankeyPlot$set(
data = d,
nodeWidth = 15,
nodePadding = 10,
layout = 32,
width = 750,
height = 500,
labelFormat = ".1%"
)
sankeyPlot
To save the chart, you can do
sankeyPlot$save('mysankey.html')

回答2:
Can't help much with the visualization piece, but you are looking for a 2-way table for the data.
Using package reshape2 and ignoring grant_id
d1 <- melt(d[,2:4])
d2 <- dcast(d1, study_section~funding_agency,sum)
> d2
study_section NCI NHLBI NIAID NIGMS NINDS
1 Aging and Geriatrics 28598.04 76524.55 0.00 109492.59 138330.12
2 Cognitive Neuroscience 76484.18 88217.42 78126.55 71546.62 73132.14
3 Drug Discovery 43667.30 39683.03 23797.24 46363.75 105655.61
4 Infectious Diseases 65375.44 136462.03 96413.08 34653.48 13835.22
5 Macromolecular Structure 84308.64 42290.61 39886.87 61645.00 67550.41
6 Medical Imaging 26264.32 86736.36 106356.13 41001.21 35549.83
7 Molecular Genetics 49473.72 0.00 110201.52 69468.03 86688.24
8 Tumor Cell Biology 99930.88 50862.39 95394.23 26269.98 46944.60
9 Tumor Progression 58719.89 52669.80 86874.89 0.00 119264.59
10 Vector Biology 64251.66 30880.81 66734.26 125524.72 0.00
This tells you which study_section received how much grant from which funding agency. Now how to display this is a different question. Maybe check out http://statmath.wu.ac.at/projects/vcd/
来源:https://stackoverflow.com/questions/19730604/visualize-in-r-flow-from-one-set-of-objects-to-another