To New-Course or Not To New-Course? #NSFGrantProposals
Skills industry-bound O.R. MS/PhD grads should have

"Algorithmic Prediction of Health-Care Costs"

Today's post will summarize a paper published in 2008 in Operations Research and co-authored by my former PhD advisor, Dimitris Bertsimas, along with six other researchers. The full citation for this paper is: Bertsimas D, Bjarnadottir M, Kane M, Kryder JC, Pandey R, Vempala S and Wang G (2008), Algorithmic Prediction of Health-Care Costs, Operations Research, 56(6): 1382-1392.

The paper demonstrates how modern data-mining methods, in particular classification trees and clustering algorithms, can be used to predict health care costs of a given year based on medical and cost data from the previous two years. The method was validated using training data from over 800,000 insured individuals over three years, and its accuracy was checked on an additional testing data set of 200,000 (called "out-of-sample", since this data wasn't used to create the algorithm).

The authors state their conclusions as follows:

"(a) our data-mining methods provide accurate predictions of medical costs and represent a powerful tool for prediction of health-care costs,
(b) the pattern of past cost data is a strong predictor of future costs, and
(c) medical information only contributes to accurate prediction of medical costs of high-cost members."

The approach uses 1,523 variables (see Table 1 of the paper):

  • variables 1-218 are diagnosis groups and counts of claims with diagnosis codes from each group,
  • variables 219-398 are procedure groups,
  • variables 399-734 are drug groups,
  • variables 735-1,485 are medically defined risk factors,
  • variables 1,486-1,489 are counts of members' diagnosis, procedures, drugs and risk factors,
  • variables 1,490-1,521 are cost variables, including overall medical and pharmacy costs, acute indicator and monthly costs,
  • variables 1,522-1,523 are gender and age.

Figure 2 of the paper shows the cumulative health-care costs of the result period for members in the learning sample, with 70% of the total health-care costs being due to around 8% of the population.

Members' costs were partitioned into five different bands or buckets "to reduce noise in the data and at the same time reduce the effects of extremely expensive members". The buckets were chosen so that they would all approximately have the same total dollar amount (i.e., the sum of all members' cost in that bucket, which varies between $116 and $119 million, represents about 20% of the total costs - the authors describe each bucket as representing low, emerging, moderate, high and very high risk of medical complications, respectively.) Cost bucket information is provided in the authors' Table 2:

Cost range    % of learning sample     Number of members
<$3,200                      83.9%                204,420
$3,200-$8,000             9.7%                  23,606
$8,000-$18,000           4.2%                  10,261
$18,000-$50,000         1.7%                   4,179
>$50,000                     0.5%                   1,175

The authors argue that the error measure "R-squared" (R^2) is not appropriate for the problem at hand, and prefer using three other measurers:

  • the hit ratio: percentage of the members for which the authors forecast the correct cost bucket.
  • the penalty error: asymmetric to capture opportunities for medical interventions (greater penalty for underestimating higher costs, set to be twice the penalty for overestimating). In mathematical terms, if the forecast bucket is i and the actual bucket is j, the penalty is set to be max {2*(j-i), (i-j)}. The penalty table is provided in Table 3 of the paper.
  • the absolute prediction error: average absolute difference between the forecasted (yearly) dollar amount and the realized (yearly) dollar amount.

(They do include R^2, truncated R^2 and |R| in their performance measures to compare their results with published studies.)

The baseline method, i.e., benchmark against which the authors' methods are compared, uses the health-care costs of the last 12 months of the observation period as the forecast. Performance metrics for this method are shown in Table 6. The overall hit ratio is 80%, but steadily declines from 90.1% for Bucket 1 (low risk group) to 19.3% for Bucket 5 (very high risk group). The other two performance measures all worsen for higher cost buckets.

The two data-mining methods implemented by the authors are:

1. Classification trees, which "recursively partition the member population into smaller groups that are more and more uniform in terms of their known result period cost." Tables 8 and 9 show examples of member types that the classification tree algorithm predicts to be in bucket 5 and 4, respectively. For instance, are predicted to be in bucket 5 "members in cost bucket 2, with nonacute cost profile, and costs between $2,700 and $6,100 in the last 6 months of the observation period, and with either (a) coronary artery disease and hypertension receiving antihypertensive drugs or (b) has peripheral vascular disease and is not on medication for it."

2. Clustering, which "organize objects so that similar objects are together in a cluster and dissimilar objects belong to different clusters." The authors' method adapts "the algorithm behind EigenCluster, a search-and-cluster engine" developed in 2004, to the context of health-care costs. The approach is as follows: the authors "first cluster members together using only their monthly cost data, giving the later months of the observation period more weight than the first months... Then, for each cost-similar cluster, we run the algorithm on their medical data to create clusters whose members have both similar cost characteristics as well as medical conditions."

The resulting performance measures are shown in Table 11. Both data-mining procedures, which produce similar results to the benchmark for the bottom bucket (bucket 1) and outperform the benchmark in every single other instance, approximately double the hit ratio for the top bucket (bucket 5) compared to the benchmark and achieve an overall hit ratio of about 84% compared to the baseline method's 80%. This significant improvement also holds in terms of penalty error and APE. The clustering algorithm is a bit stronger in predicting high-cost members. The authors suggest this is because of "the hierarchical way cost and medical information is used."     


The comments to this entry are closed.