This all began with an introductory presentation about social network analysis to a group of medical students. What better way to grab their attention than with attractive, fake doctors having sex on television? Naturally this led to the dense network of sexual contacts between characters on the Grey’s Anatomy television show. After viewing many hours of previous episodes and exploring fan pages (especially here for an early attempt at a graph representation of sexual contacts), I was able to come up with an extensive but by no means exhaustive list of contacts. The edge list is available here.

This example uses the igraph package for R, both free to download. First we create the graph, give it a layout, and plot.

```
```library(igraph)
ga.data <- read.csv('ga_edgelist.csv', header=TRUE)
g <- graph.data.frame(ga.data, directed=FALSE)
summary(g)
g$layout <- layout.fruchterman.reingold(g)
plot(g)

Without knowing who is represented by each vertex, what can you deduce from the graph? From a public health perspective, if you could test one person for sexually transmitted infections (STIs), who would it be? If you could provide counseling and a free box of condoms to one person, who would it be? If you knew that an epidemic was spreading through this network, who would you want to be to best avoid it?

Now let’s make the visualization a little more interesting. First we can remove the labels for now, and then change the size of the vertex to represent the degree, or degree centrality, corresponding to the number of partners of each vertex. In the context of transmissible infections, this would indicate the number of people a person could infect or be infected by through sexual contact.

```
```V(g)$label <- NA # remove labels for now
V(g)$size <- degree(g) * 2 # multiply by 2 for scale
plot(g)

` `

This tells us about the absolute number of partners, but not much about the relative position within the network. Let’s examine two types of centrality: closeness and betweenness. The closeness centrality is the average shortest path from one vertex to every other on the graph. A high number indicates that a vertex is quickly reachable by the majority of vertices in the graph, while a low number indicates that the vertex is far from most other vertices on the graph. We can calculate the centrality and then rescale the values to create a color scheme to visualize the relative differences.

```
```clo <- closeness(g)
# rescale values to match the elements of a color vector
clo.score <- round( (clo - min(clo)) * length(clo) / max(clo) ) + 1
# create color vector, use rev to make red "hot"
clo.colors <- rev(heat.colors(max(clo.score)))
V(g)$color <- clo.colors[ clo.score ]
plot(g)

It appears there are a few vertices on the red “hot” end of the spectrum, and a few at the “cold” end. Next we do the same for each vertex to calculate the betweenness centrality, which is the number of shortest paths on the network that pass through the vertex. Vertices with high betweenness centrality might be thought of as serving a gatekeeper role in mediating the shortest connections between other vertices.

```
```btw <- betweenness(g)
btw.score <- round(btw) + 1
btw.colors <- rev(heat.colors(max(btw.score)))
V(g)$color <- btw.colors[ btw.score ]
plot(g)

This last graph of betweenness indicates slightly more variation among the likely suspects, while the analysis of closeness centrality demonstrated less variation. Why?

A useful technique in social network analysis is the use of community finding algorithms. Here we use the implementation of the Girvan-Newman algorithm (paper here) to detect the underlying community structure of the graph. We will iterate through each merge to determine which cut produces the maximum modularity, and then use that number to calculate the groups.

```
```gnc <- edge.betweenness.community(g, directed=FALSE)
m <- vector()
for (s in 0:nrow(gnc$merges) ) {
memb <- community.to.membership(g,gnc$merge,steps=s)$membership
m <- c(m,modularity (g, memb, weights=NULL))
}
ideal_steps <- which(m==max(m)) - 1
plot(0:(length(m)-1),m, col="blue",xlab="Steps",ylab="Modularity")
gn.groups <- community.to.membership(g,gnc$merge, steps=ideal_steps)$membership
V(g)$color <- gn.groups
V(g)$size <- 15 # reset to default size
plot(g)

Once you see the graph with names, it is interesting to note the breaks in connectivity around race and age (I guess you have to know the TV characters to appreciate this ) So before seeing the names, back to the original question. Who would you test? Who would you counsel? Who would you vaccinate? Who would you rather be?

And the winners are…

```
```V(g)$color <- 'grey'
V(g)$label <- V(g)$name
V(g)$label.cex <- 0.7 # rescale the text size of the label
plot(g)

It would be:

1. Karev

2. Sloan

In the order of appearance if they were to be tested. BTW good and interesting post.

Agreed. For percolation of disease, the

closenesstrumpsdegreecentrality for a first indicator node. So I would test Karev first. But Sloan has a higherbetweennesscentrality than Karev, and so I would vaccinate/prophylax Sloan first.Cool blog! Thanks for the very good and painless tutorial on how to get a coloring by groups in the modularity algorithm – I’m not sure how long it would have taken me to find this out with the iGraph documentation I’m just preparing a lecture on the design of network analytic methods. May I use your data? I’m also writing a book on this topic, would you be okay with using your data for this as well? If so, please just let me know how to reference to it!

Best,

Nina

Hi Nina,

Thanks for your comments, I’m glad you found it helpful! All the content here is under a creative commons license, so feel free to reuse with attribution in your lecture and book. You could cite it something like:

Weissman, GE. “Grey’s Anatomy Network of Sexual Relations” March 25, 2011. Available online at http://www.babelgraph.org/wp/?p=1

Thanks for this post, this will come in handy when I have to explain parts of my phd to non-sna people

I think one problem with your “whom to test/vaccinate”-question is that the time dimension is also relevant as the diseases would only spread via newly established or “reused” connections. So each person connects only to the parts of the graph which had already established connections when the new connection occurred.

Hi Jan,

Thank for your comment. It is fun to think about ways to introduce the language of SNA to newcomers.

You’re definitely right that the example does not address the time dimension. Since disease can only be propagated through existing connections, and the graph above is a graph of cumulative connections, it doesn’t accurately reflect the actual network of potential transmissions. The group at University of Washington put together an incredible video on transmission of disease and concurrency in an evolving graph. Good luck with your PhD!

Excellent post! Great example to illustrate the concepts. As a social scientist I ask these follow-up questions:

1.Who would you test? – for what? Pregnancy or HIV or BV?

2. Who would you counsel? – again, for what? Safer sex or increasing chances of conceiving? (remember Derek and Meredith want a child)

3. Who would you vaccinate? – Against what? HPV?

4. Who would you rather be? – The one who tests negative

Laura, great questions. Further evidence that actually knowing something about your patient and their social situation, life, preferences, values, etc, makes a big difference in how care is delivered!

Cool post! I’ve never watched GA nor do I have much clue about SNA. Something I noticed is that there are no triangles in the graph (ie menage a trois) which got me thinking are there ways to conclude from the graph alone whether the social network includes bisexual relationships or not?

That’s a great question: math and sociology all in one. I’ve never thought about that, but my hunch is this: if the graph is truly bipartite (contains no odd length cycles), then there can be no same-gender relationships (unless all relationships in the graph are same-gender). Conversely, if the graph is not bipartite, there must be a same-gender relationship.

As for triangles, assuming gender is represented by a binary space, a triangle could consist of either two vertices of one gender and one of the other (one bisexual actor), or three of the same gender (three homosexual actors).

There is one bisexual actor in the graph, but no triangles because her partners did not sleep with each other. At least not yet. Maybe I have to watch more episodes