In a previous post, I discussed different approaches to speeding up some loops in data frames. In particular, R data frames provide a simple framework for representing large cohorts of agents in stochastic epidemiological models, such as those representing disease transmission. This approach is much easier and likely faster than trying to implement cohorts of R objects. In this post we’ll explore a simple agent-based model, and then benchmark a few different approaches to iterating through the cohort. Rcpp outperforms all of them by a few orders of magnitude. Priceless.
Let’s say we are trying to predict the probability of someone choosing to receive a vaccination in a given year. The decision will be based on their age (
age), gender (
female), and whether or not they were infected with the virus last year (
ily). Let’s make up some data:
Posted in Health, Medicine, Modeling, R
Tagged agents, benchmarking, C++, infectious diseases, modeling, programming, R, Rcpp
In a recent blog post by CMastication, a little meme puzzle is presented with the introduction that a preschooler could solve it in 5-10 minutes, a programmer in an hour. I took the bait.
The original problem goes like this:
N.B. It turns out my strategy is completey wrong, but read on for an experiment with using
parse to generate code on the fly. An explanation and computational solution are available at the original site mentioned above.
Posted in R
Tagged fun, games, R, whimsy
In health economics it is common to use agent-based simulations for exploring epidemiological models, prevention policies, and clinical interventions, among other things. In C++ I enjoy using object-oriented design to build these agent-based models. It feels so natural. In R, however, I have yet to delve into the S4 object model, and so have instead resorted to using data frames for simple object data structures. Stochastic, agent-based models often require large cohorts and multiple trials, so finding improvements in speed is a great help. The examples listed below are inspired by comments made recently on the r-help list, to whose contributors I am very grateful.
The recent opening of the Heritage Health Prize both represents a milestone and raises a cautionary flag. On the one hand, crowdsourced analytics prizes have never tackled anything so noble (not to discount predicting movie ratings), but on the other hand, are we just looking for nails because we all have hammers?
There is a great introduction to importing and preparing the data set here. What next?
If you were just planning to grind the data set straight through your Weka engine, or simply run an ensemble of 100,000 decision trees (am I allowed to say random forest in my blog?) through your Beowulf cluster, you can stop reading here. If, however, you wonder if an understanding of pathophysiology, epidemiology, and clinical medicine might yield some insight into your approach for analytics in this competition, read on.