Tuesday, November 26, 2013

ERGM: Creating Large Fully-Connected Network objects in Statnet

TL;DR

Use the following approach:

network <- network(matrix(1, n, n), directed=F)
(At least with Statnet 1.7 / R64 2.13)

In more depth

Statnet seems optimized for sparsely connected graphs. This is not too surprising, since many of the "real" graphs I deal with have a density around d = .0002 or thereabouts, and even some fairly large small world graphs, like IMDB only have a density of ~.18. However, there's one special case where I need to have a fully connected graph: the input to an edgecov() term in an ERGM model. This graph has to have all possible edges, not just the observed edges, and so, density = 1.0.

One of the challenges is how to create and initialize these very large networks. The step to create them would often take a very long time, and it wasn't clear I was using the best approach. There are at least two method that seemed plausible: use matrix to initialize an adjacency matrix, or use network.intialize and then add all of the edges in afterwards. It was not clear up front which one would be faster. So, I did a quick experiment: I ran each method 50 times on various sized graphs, and compared the results.

# Method 1: network(matrix())
startTime = proc.time()
for (i in 1:50) n <- network(matrix(1, 200, 200), directed=F)
proc.time() - startTime

# Method 2: network.intialize() and then assignment
startTime = proc.time()
for (i in 1:50) {
n <- network.initialize(200, directed=F)
n[,] <- 1
}
proc.time() - startTime

The results were pretty clear:

Method 200x200 500x500
#1 12.32 361.91
#2 42.42 8+ hours

I used proc.time() for the timing. There are suggestions that this is not super-accurate, but the difference is so stark, I think even 1s resolution is more than enough. Also: I've discovered that 32-bit R is a really bad environment for working with even "medium-sized" graphs (500 nodes or so), much less "large" graphs. The extra address space afforded by the 64bit version of R avoids a lot of out-of-memory conditions.

0 Comments:

Post a Comment

<< Home