Statnet & "foreign" Vertex IDs
It's common that your network comes from an external source, and the
vertices already have unique identifiers. Unfortunately, if I am
importing the network into Statnet using edgelist format, Statnet
requires the vertex identifiers to be in sequential order, beginning
at zero. This requires me to do a bit of pre-processing of the data,
which is not always desirable: it requires a look-up step if I want
to figure out who a particular vertex refers to, and makes attaching
additional attributes more complex.
However, two, "lightly documented" features can help out here. First, statnet already accepts textual vertex names in the standard network constructors. So, if the vertices already have (unique) text names, they can be used with no additional work:
Persistent IDs (at least to me) "feel" like the better approach to this, but they are also more complex to use in practice. In particular, adding edges is not as easy as you'd like. In order to use persistent IDs, I first have to create the network, then initialize it for use with persistent IDs. In this example, I'm going to start with a (mostly) empty network, and then add all of the vertices and edges to it. I'm assuming that the edgelist is already loaded in num.edgelist and it has two columns, a tail and a head. The network has to be initialized with at least one node, otherwise the
However, two, "lightly documented" features can help out here. First, statnet already accepts textual vertex names in the standard network constructors. So, if the vertices already have (unique) text names, they can be used with no additional work:
text.edgelist <- rbind(c("tom_", "dick7"), c("dick7", "harry"),
c("harry", "jill"), c("harry", "jane"))
n_alpha <- as.network(text.edgelist, matrix.type='edgelist', directed=F)
plot(n_alpha, displaylabels=T)
If the identifiers are numbers, the simple approach then is to convert
the numeric vertex identifiers to a string (of numbers), and use that
string as the input to as.network
:
num.edgelist <- rbind(c(3, 4), c(4, 6),
c(6, 8), c(6, 9))
# Map the numeric edge list to characters
char.edgelist <- apply(num.edgelist, c(1,2), as.character)
n_alpha <- as.network(char.edgelist, matrix.type='edgelist', directed=F)
plot(n_alpha, displaylabels=T)
This approach works, and depending on data and goals, may be the
easiest approach. There's also a second feature in statnet that can
solve this problem: persistent IDs. Persistent IDs are part of the
dynamic network package, and provide a way to attach unique values to
vertices and edges that don't change regardless of any network
manipulation.Persistent IDs (at least to me) "feel" like the better approach to this, but they are also more complex to use in practice. In particular, adding edges is not as easy as you'd like. In order to use persistent IDs, I first have to create the network, then initialize it for use with persistent IDs. In this example, I'm going to start with a (mostly) empty network, and then add all of the vertices and edges to it. I'm assuming that the edgelist is already loaded in num.edgelist and it has two columns, a tail and a head. The network has to be initialized with at least one node, otherwise the
initialize.pids()
call is not
"sticky."
net.pid <- network.initialize(1, directed=F)
initialize.pids(net.pid)
The next step is to add all of the vertices. This step is
straight-forward. First, I create a list of the unique node
identifiers (unique()
is part of the base R
distribution). Then, use that list to add the vertices and initialized
them with the persistent ID.
node.list <- unique(c(unique(num.edgelist[,1]),
unique(num.edgelist[,2])))
add.vertices(net.pid, length(node.list), vertex.pid = node.list)
Unfortunately, there is no add.edges()
function which
takes an edge defined in terms of the persistent IDs. This is not a
huge problem: it's only a few lines of code to map a list of
persistent IDs to the internal node representation, using the
get.vertex.id
function already written. Using
apply
, it's one (logical) line of R code, which gives the
edgelist "mapped" to the internal vertex IDs. This list can be passed
directly to add.edges()
:
mapped.edges <- apply(num.edgelist, c(1,2),
function(v.pid, net) {
get.vertex.id(net, v.pid)
}, net.pid)
add.edges(net.pid, mapped.edges[,1], mapped.edges[,2])
There's one last bit of clean-up to do. At the beginning, I created the network
with one node. This was done to make sure that the
initialize.pids()
call was "sticky." I need to remove
that initial node. Since it was the first node added, I know the id is
1:
delete.vertices(n_pid, 1)
Finally, it's possible to plot the graph using the persistent IDs as
labels. Just use the %v%
operator to extract the
vertex.pid
attribute:
plot(net.pid, label=net.pid %v% 'vertex.pid')