Inproper show when use geom_net in R

大憨熊 提交于 2020-01-14 10:27:30

问题


Given a data frame as follow:

v1     v2     v3     v4
Tom     A     Jim     B
Gary    A     Shirly  A
Shirly  B     Jack    B
Tom     A     Jack    B
...

v2 and v4 denote which group the name in v1 and v3 respectively belongs to. Tom belongs to group A and Jim belongs to group v4. I'd like to plot a social network with geom_net, with lines linkage to two names if they are in the same row, for instance, Tom and Jim. And the size of edges should be proportional to the times they have been appeared in V3, i.e, the edge of Jack should be as twice big as Jim and Shirly.

I tried

ggplot(df, aes(from_id = V1,to_id = V3)) +geom_net()

But a very bad result is given:

And a warning is generated:

In f(..., self = self) :
There are 35 nodes without node information:
#And the below are all the values in V1 and V3
Tom, Shirly, ....
Did you use all=T in merge?

I wonder how to show the result in a proper and good looking way with no x-axis or y-axis and the relationship among edges should be clearly shown. And the edges' color should represent the groups they belongs to. That means all names in the same group should have same color.

Hope to get your help! Thanks in advance!


回答1:


I struggled with this too until I figured out what the correct data.frame structure was for the geom_net package. Basically what you need is a data.frame that has two parts: in part 1 you describe the edges (the lines drawn) by providing a FROM and a TO column. Optionally, additional info can be provided in a separate column e.g., linewidth

ans <- read.table(text ="
from to linewidth
Tom Jim 0.1
Gary Shirly 1
Shirly Jack 0.5
Tom Jack 2
", sep = " ", stringsAsFactors = FALSE, header=TRUE)

p <- ggplot(data = ans, aes(from_id = from, to_id = to))
p + geom_net(label = TRUE, vjust=-1)

But you will notice that some of the nodes (vertices) are not labelled. So this is where part 2 of the data.frame is important. In part 2 you supply the names of the nodes to be labelled. This is because geom_net only labels the FROM node and not the TO node, so you will need to supply, as a minimum, the names of the nodes that are not used as a FROM point.

ans <- read.table(text ="
from to linewidth
Tom Jim 0.1
Gary Shirly 1
Shirly Jack 0.5
Tom Jack 2
Helen Jack 3
Jim NA NA
Jack NA NA
", sep = " ", stringsAsFactors = FALSE, header=TRUE, na.strings = "NA")

p <- ggplot(data = ans, aes(from_id = from, to_id = to, linewidth = linewidth))
p + geom_net(label = TRUE, vjust=-1)

Several things going on above: 1) I added "Jim NA NA Jack NA NA" as labels for the unlabeled nodes, 2) also added na.strings = "NA" to ensure that read.table() properly interprets the NA values, and 3) I added the linewidth parameter to the aes so that it maps from the data.frame to the plot.

Also, once you supply names for all the nodes, the warning message "There are XX nodes without node information" goes away.

Hope that helps edit: as requested I added the resultant output. Since geom_net() changes the layout each time it is run, I have included two example images


Just to complete the whole data.frame building process, I have included below a case where you have two separate data.frames and you need to merge them together: first data.frame is for the lines (edges) and the second is the nodes (vertices).

lines <- read.table(text ="
from to linewidth
Tom Ivy 0.1
Gary Ivy 1
Shirly Ivy 0.5
Tom Helen 2
Helen Ivy 3
", sep = " ", stringsAsFactors = FALSE, header=TRUE, na.strings = "NA")

nodes <- read.table(text ="
name
Tom
Jim
Gary
Shirly
Jack
Helen
Susan
Joel
Ivy
", sep = " ", stringsAsFactors = FALSE, header=TRUE,na.strings = "NA")

df <- merge(lines, nodes, by.x = "from", by.y = "name", all = TRUE)

p <- ggplot(data = df, aes(from_id = from, to_id = to, linewidth = linewidth))
p + geom_net(label = TRUE, vjust=-1)




回答2:


maintainer of geomnet here. If possible please post future questions to github.com/sctyner/geomnet/issues. @hackR has the right idea, of which there are several examples in the documentation. The idea is: you have an edges data frame has a from_id and a to_id column (+additional columns), and you also have a vertices data frame with an id column (+additional columns). Then you merge them:

network_data <- merge(edges, vertices, by.x = "from_id", by.y = "to_id", all = T)

Don't forget to include the all = T argument!

Thanks, Sam.



来源:https://stackoverflow.com/questions/34976716/inproper-show-when-use-geom-net-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!