New season of Grey’s Anatomy with Exponential Random Graph Models

In a previous post we used the web of sexual contacts among characters on the Grey’s Anatomy television show to look at some social network analysis using R. To celebrate the beginning of the new season, Ben Lind has put together a Grey’s Anatomy tutorial on exponential random graph (ERG) models. Who knew popular culture could teach us so much science? Enjoy.

Posted in Modeling, R, Social Networks, Uncategorized | Tagged , , , | Comments Off

Rcpp is smoking fast for agent-based models in data frames

In a previous post, I discussed different approaches to speeding up some loops in data frames. In particular, R data frames provide a simple framework for representing large cohorts of agents in stochastic epidemiological models, such as those representing disease transmission. This approach is much easier and likely faster than trying to implement cohorts of R objects. In this post we’ll explore a simple agent-based model, and then benchmark a few different approaches to iterating through the cohort. Rcpp outperforms all of them by a few orders of magnitude. Priceless.

Case

Let’s say we are trying to predict the probability of someone choosing to receive a vaccination in a given year. The decision will be based on their age (age), gender (female), and whether or not they were infected with the virus last year (ily). Let’s make up some data:

Continue reading

Posted in Health, Medicine, Modeling, R | Tagged , , , , , , , | 20 Comments

Calculating the mixing matrix and assortativity coefficient with igraph in R

The mixing matrix of a graph gives the density of edges between vertices with different characteristics. The mixing matrix for a given igraph object can be calculated using the following function:

Continue reading

Posted in Modeling, R, Social Networks | Tagged , , , , | 4 Comments

R can write R code, too

In a recent blog post by CMastication, a little meme puzzle is presented with the introduction that a preschooler could solve it in 5-10 minutes, a programmer in an hour. I took the bait.

The original problem goes like this:

8809=6
7111=0
2172=0
6666=4
1111=0
3213=0
7662=2
9313=1
0000=4
2222=0
3333=0
5555=0
8193=3
8096=5
7777=0
9999=4
7756=1
6855=3
9881=5
5531=0
2581=?

N.B. It turns out my strategy is completey wrong, but read on for an experiment with using eval and parse to generate code on the fly. An explanation and computational solution are available at the original site mentioned above.

Continue reading

Posted in R | Tagged , , , | 1 Comment

Speeding up agent-based simulations with data frames in R

In health economics it is common to use agent-based simulations for exploring epidemiological models, prevention policies, and clinical interventions, among other things. In C++ I enjoy using object-oriented design to build these agent-based models. It feels so natural. In R, however, I have yet to delve into the S4 object model, and so have instead resorted to using data frames for simple object data structures. Stochastic, agent-based models often require large cohorts and multiple trials, so finding improvements in speed is a great help. The examples listed below are inspired by comments made recently on the r-help list, to whose contributors I am very grateful.

Continue reading

Posted in Modeling, R | Tagged , , | 13 Comments

Using R and clinical heuristics to explore the Heritage Health Prize: what do we gain?

The recent opening of the Heritage Health Prize both represents a milestone and raises a cautionary flag. On the one hand, crowdsourced analytics prizes have never tackled anything so noble (not to discount predicting movie ratings), but on the other hand, are we just looking for nails because we all have hammers?

There is a great introduction to importing and preparing the data set here. What next?

If you were just planning to grind the data set straight through your Weka engine, or simply run an ensemble of 100,000 decision trees (am I allowed to say random forest in my blog?) through your Beowulf cluster, you can stop reading here. If, however, you wonder if an understanding of pathophysiology, epidemiology, and clinical medicine might yield some insight into your approach for analytics in this competition, read on.

Continue reading

Posted in Health, Medicine, R | Tagged , , | 5 Comments

The structure of twitter participant relationships in conversations around #Libya, #Bieber, and #Rstats

I am a recent comer to twitter, and it took me a few weeks to figure out what this was all about. Who are all these people tweeting each other and what do all these trending hashtags mean? Do these people all know each other? How do they get involved in these conversations? Are people really talking/listening to each other, or just spewing 140 character projectiles out into the void?

This piqued my interest in the structure of relationships of participants in different twitter conversations. Using R, with the twitteR and igraph packages, I wondered what I would find…

Continue reading

Posted in R, Social Networks, twitter | Tagged , , , , | 4 Comments

Grey’s Anatomy Network of Sexual Relations

This all began with an introductory presentation about social network analysis to a group of medical students.  What better way to grab their attention than with attractive, fake doctors having sex on television?  Naturally this led to the dense network of sexual contacts between characters on the Grey’s Anatomy television show.  After viewing many hours of previous episodes and exploring fan pages (especially here for an early attempt at a graph representation of sexual contacts), I was able to come up with an extensive but by no means exhaustive list of contacts.  The edge list is available here.

This example uses the igraph package for R, both free to download. First we create the graph, give it a layout, and plot.

Continue reading

Posted in Health, R, Social Networks | Tagged , , , , | 10 Comments