The recent opening of the Heritage Health Prize both represents a milestone and raises a cautionary flag. On the one hand, crowdsourced analytics prizes have never tackled anything so noble (not to discount predicting movie ratings), but on the other hand, are we just looking for nails because we all have hammers?
There is a great introduction to importing and preparing the data set here. What next?
If you were just planning to grind the data set straight through your Weka engine, or simply run an ensemble of 100,000 decision trees (am I allowed to say random forest in my blog?) through your Beowulf cluster, you can stop reading here. If, however, you wonder if an understanding of pathophysiology, epidemiology, and clinical medicine might yield some insight into your approach for analytics in this competition, read on.
Continue reading →