Louvain community detection in R using igraph - assigns alternating group membership assignment

问题

I have been running Louvain community detection in R using igraph, with thanks to this answer for my previous query. However, I found that the cluster_louvain method seemed to do something strange with assigning group membership, which I think was due to an error in how I imported my data. Whilst I think I resolved this I would like to understand what the problem was.

I ran louvain clustering on a 400x400 correlation matrix (i.e. correlation scores for 400 individuals). When I initially imported my data, my correlation matrix had the same individuals’ ID numbers (i.e. vertex numbers) for both the row and column headings, as below:

    1     2     3     4   ... 400 
1   0     0.8   0.7   0.1 
2   0.8   0     0.6   0.3
3   0.7   0.6   0     0.9
4   0.1   0.3   0.9   0                    
...
400

This correlation matrix was saved in a "Correlations.csv" file, which I imported using read.csv. I then used the below code to convert it to a distance matrix, remove correlations below a certain threshold, turn it into an adjacency matrix for igraph, and run cluster_louvain: (This code is also provided in the answer here).

correlationmatrix <- read.csv("Correlations.csv", header = TRUE, 
row.name = 1, check.names = FALSE)

distancematrix <- cor2dist(correlationmatrix)
DM2<- as.matrix(distancematrix)
DM2[correlationmatrix < 0.33] = 0

G2 <- graph.adjacency(DM2, mode = "undirected", weighted = TRUE, diag = TRUE)
clusterlouvain <- cluster_louvain(G2)

sizes(clusterlouvain)
Community sizes
1  2
200 200

I then wanted to get the cluster number beside each ID number, to know which individual belonged to each community. So I used IDs_cluster <- cbind(V(G2)$name, clusterlouvain$membership). This gave the list of vertex IDs but the membership beside them was listed as ‘1 2 1 2 1 2 1 2’, which obviously was not right (as we would not expect every alternate individual in the dataset to be assigned to a different community):

ID  Membership
1   1
2   2 
3   1
4   2
5   1
6   2
…
400 2

From looking at other datasets I realised the problem might have been because the row headings in my correlation matrix were numerical. So I changed the correlation matrix so that the row headings were still the ID numbers, but the column headings were `V1-V400':

    V1    V2    V3    V4   ... V400 
1   0     0.8   0.7   0.1 
2   0.8   0     0.6   0.3
3   0.7   0.6   0     0.9
4   0.1   0.3   0.9   0                    
...
40

I imported this as a .csv file and re-ran ‘cluster_louvain’, as below:

correlationmatrix_V <- read.csv("Correlations_withV.csv", header = TRUE,
row.name = 1, check.names = FALSE)

distancematrix_V <- cor2dist(correlationmatrix_V)
DM2_V <- as.matrix(distancematrix_V)
DM2_V[correlationmatrix_V < 0.33] = 0

G2_V <- graph.adjacency(DM2_V, mode = "undirected", weighted = TRUE, diag = TRUE)
clusterlouvain_V <- cluster_louvain(G2_V)

Now when I reran cluster_louvain, it generated a more sensible result of three clusters, with group membership to each cluster looking more like what we would expect:

sizes(clusterlouvain_V)
Community sizes
1     2    3
168 52 180

IDs_cluster <- cbind(V(G2_V)$name, clusterlouvain_V$membership)
View(IDs_cluster)
ID  Membership
1   1
2   1 
3   3
4   2
5   2
6   2
…
400 1

My question is: May it be possible to clarify what happened when using the same row and column headings, that meant group membership was assigned to alternate individuals (i.e. '1 2 1 2' down the ID list, as in the first example), but was resolved when changing the column headings to a non-numerical format (as in the second example)?

This may be a simple mistake in that when importing the .csv of the correlation matrix using ‘read.csv’ I did not use the correct settings, given my column headings were also numerical.

However, would like to understand why this meant ‘cluster_louvain’ assigned group membership in the way it did. I am posting this in case it may be useful if anyone makes the same mistake I did above. Any insights would be welcome, and thank you for any advice!

来源：https://stackoverflow.com/questions/49856205/louvain-community-detection-in-r-using-igraph-assigns-alternating-group-member

标签

cluster-analysis

igraph