Previous month:
April 2013
Next month:
June 2013

May 2013

The Center for Medicare and Medicaid Innovation (1/2)

Today’s post is on the Center for Medicare and Medicaid Innovation, which was created by Section 3021 of the Affordable Care Act and “supports the development and testing of innovative health care payment and service delivery models.” (quote is from the website). Under the direction of Richard Gilfillan, M.D. (director) and assisted by Sean Cavanaugh (Acting Deputy Director) and Thomas Reilly (Deputy Director), it pursues the following objectives:

  • Testing new payment and service delivery models,
  • Evaluating results and advancing best practices,
  • Engaging stakeholders.

Testing innovative payment and service delivery models in an appropriate timeframe requires rapid-cycle evaluation, which is described in a recent article of my favorite magazine, Health Affairs. The article also “describes the relationship between the group’s work and that of the Office of the Actuary at the Centers for Medicare and Medicaid Services.”

Some of the models currently being evaluated and active programs include:

Accountable Care Organizations: referring to “groups of doctors, hospitals, and other health care providers, who come together voluntarily to give coordinated high quality care to the Medicare patients they serve.” (The Center for Medicare and Medicaid Innovation is part of the Center for Medicare and Medicaid Services (CMS), the government agency behind Medicaid/Medicare [healthcare for the poor and the elderly].) The overall idea is that integrated care will lead to better care through improved coordination and thus lead to lower costs.

The video below from CMS explains ACOs (it was posted on YouTube by CMSHHSgov):

An example of ACO is the Pioneer ACO model. It is aimed at organizations that already have experience operating as ACO. Such new payment models are accompanied by requirements on reporting 33 quality measures, including  the Consumer Assessment of Healthcare Providers and Systems (patient experience survey measures), claims-based measures, the Electronic Health Record Incentive Program measure , among many others, to make sure that the lower cost doesn’t come at the expense of quality. The Shared Savings Program is another related program attempting to rebalance financial incentives for providers.

Bundled Payments for Care Improvement Initiative: in this initiative, a single payment will be made for a full episode of care, instead of payments being made for every single resource used during treatment (tests, physicians’ time, etc), which “can result in fragmented care with minimal coordination”. The key, of course, is to define the bundle appropriately (and then set the correct price).

CMS is investigating four broadly defined models of care with participating organizations:

  1. Retrospective Acute Care Hospital Stay Only (the episode of care is defined as the inpatient stay in the acute care hospital),
  2. Retrospective Acute Care Hospital Stay plus Post-Acute Care (inpatient stay plus a defined number of days [30, 60 or 90] after hospital discharge),
  3. Retrospective Post-Acute Care Only (“the episode is triggered by an acute care hospital stay an will begin at initiation of post-acute care services”),
  4. Acute Care Hospital Stay Only (“CMS will make a single, prospectively determined bundled payment to the hospital”).

Payment arrangements will include financial and performance accountability. The website lists the 48 episodes covered by these payment models and related DRGs (Diagnosis-Related Groups).

Next week: health care innovation awards and state innovation models initiative.


An International Perspective on Healthcare Cost Containment Strategies

HA-Apr13-cover The paperHealth Care Cost Containment Strategies Used in Four Other High-Income Countries Hold Lessons for the United States”, authored by a group of researchers from the University of Toronto, the London School of Economics, Berlin University of Technology, the Paris Health Economics and Health Services Research Units and the University of Regina, and published in the April issue of Health Affairs, reviews strategies developed in the past decade to contain costs in Canada, France, England and Germany.

The four countries were chosen to “represent a range of health system organizational structures”, specifically:

  • In Canada, a “highly decentralized system of national and provincial payers”, where “the individual provinces are responsible for most decisions affecting the health sectors” (the authors focus on Ontario, British Columbia, Manitoba, Saskatchewan and Alberta),
  • In Germany, a “system of competing health insurance or “sickness” funds”,
  • In France, “noncompeting health insurance funds”,
  • In the United Kingdom, responsibility for health care was “shifted from the central government to governing bodies… at the end of the 1990s” and the authors focus on England only.

As their main conclusion, the authors find a shift “toward policies aimed at changing the cost-benefit ratio by tailoring payment to value” through technology assessment and funding based on “activity” (an example of which is a diagnosis-related groups) instead of simply pushing more costs to households through “across-the-board budget cuts, rationing of services and higher user charges.”

The analytical foundation of the article is provided by a framework developed by Elias Mossialos and Julian Le Grand in 1999, which “categorizes cost containment strategies according to whether they shift health care expenditures to an alternative budget, usually household budgets, by reducing coverage; set budgets – that is, impose upper limits on health spending in specific areas – from the national level to the patient level; or apply direct or indirect controls to the supply of health care”. A key contribution by Mossialos and Le Grand was to document countries’ shift in strategies over time.

The Health Affairs paper first explains in depth the strategies that the countries they investigated now employ to contain public-spending costs. Below I mention a few (but not all by far) of the examples they provide.

  • Budget shifting
    • Population coverage (not used in practice)
    • Service coverage (including “refusing to include new interventions that lacked evidence of effectiveness and cost-effectiveness”
    • Cost coverage (especially increasing patient cost sharing and introducing new user charges, e.g., “France introduced deductibles calculated per service, Canada applied and increased user chargers for prescription drugs and Germany did the same for physician visits.” But “Germany introduced a cap on out-of-pocket payments… [and] abolished user charges for hospice care”, among other things. France introduced “free complementary insurance covering user charges for people with very low income.”)
    • Public budget shifting (for instance, “France shifted responsibility for subsidizing long-term care for older people from the central to local governments in 1997.”)
  • Budget setting (the authors note that “activity-based funding has probably softened budget constraints” and that England and France “have emphasized linking provider payment to evidence of quality”)
    • Budget caps (England has a national budget cap, but does not set budgets by sector; instead, “local purchasers are able to determine how to spend their own “soft” or target budgets”, Germany has “sectoral budgets for hospitals and ambulatory care”, Canada generally uses soft budget caps at the regional and hospital levels.)
    • Provider payment (“all four countries have… mov[ed] toward activity-based hospital payments”, in addition, there have been “small shifts toward capitation payment” in countries that mainly pay providers on a fee-for-service basis.)
  • Direct and indirect controls of health care supply
    • Controlling pharmaceutical prices
    • Controlling physician remuneration, including rate freezes
    • Other techniques such as cutting the number of hospital beds (France) or increasing the supply of doctors and nurses (Canada).

They then proceed to assess a decade of developments, especially “the increased use of policies intended to promote more efficient use of healthcare services” and not simply budget and price controls.

  • Activity-based funding (“There was a move toward funding based on activity or diagnosis-related group to replace global budgets for hospitals in England, France and Canada”)
  • Health technology assessment (agencies have been established to advise policy-makers)
  • Pharmaceutical spending (All four countries have “explicitly negotiated and worked with pharmaceutical companies and resellers – such as pharmacies – on prices, policies and rebates.” Germany and British Columbia, Canada have experimented with reference pricing and other “value-based approaches to pricing drugs, in which a drug’s clinical value and cost effectiveness are used to negotiate its price or set reimbursement levels.”)

The authors also make the caveat that “many of the strategies reviewed here have multiple goals… For instance, most countries introduced activity-based funding to improve efficiency, quality, transparency and productivity – not necessarily to reduce costs, at least in the short term.”

They provide lessons for other countries, especially the US. You’ll have to read the whole article to know what they think! I found the following idea particularly intriguing: “The United States may wish to use… cost-effectiveness analysis that sets prices for new technologies based on the technologies’ relative value and value-based user charges.”

A very well-written, informative paper with a great international perspective on strategies to contain healthcare costs.


Healthcare Reimbursement Methods

Today’s post is a summary of Chapter 17 of Understanding healthcare financial management, 6th ed, by Gapenski and Pink. That chapter is entitled “Capitation, Risk-sharing, Pay for Performance and Consumer-Directed Health Plans.”

Capitation “is a flat periodic payment per enrollee to a healthcare provider; it is the sole reimbursement for services provided to a defined population… Often, capitation payments are expressed as some dollar amount per member per month (PMPM).” They are adjusted for age and gender, and can also be adjusted for risk. Risk adjustment is “an actuarial process that incorporates health status into the PMPM amount.”

The authors discuss the financial incentives under capitation and compare the revenue and cost structures under fee-for-service and capitation (Exhibit 17.1 p.623 is particularly instructive.) While capitation leaves providers exposed to losses when the cost exceeds the flat fee, it also provides more predictable revenues.

Risk sharing is implemented “to encourage providers to act in the best interest of the system rather than self-interest”, in particular to mitigate misaligned incentives between primary care physicians, who “benefit financially from referring care to a specialist rather than providing that care” while “specialists, who also receive capitated payments, may not welcome the added volume.”

An example of risk-sharing arrangement is a risk pool (or withhold), “pools of money that are initially withheld [usually about 10-20% of reimbursement money] and then distributed to panel members [at the end of the year] if they meet certain pre-established goals.”

The book provides two examples, one of a single risk pool, which places only the primary care providers at risk, and one of two risk pools (“a professional services risk pool for the physicians only” and “an inpatient services risk pool shared equally by the HMO, physicians and hospital”).

Pay for performance (P4P) “refers to any reimbursement scheme that makes meeting performance standards a prerequisite for some or all of a provider’s payment.” (Risk pools are a type of P4P payment.)

Performance is usually evaluated according to outcomes, process, patient satisfaction and structure and rewards may be obtained for three types of performance:

  • relative performance (compared to other providers),
  • benchmark performance (based on attaining a pre-identified benchmark),
  • improvement performance (compared to the provider’s past history).

Gapenski and Pink illustrate the concept of P4P on an example involving Pay for Quality and Pay for Productivity.

Consumer-directed health plans “use financial incentives to influence patient behavior” in contrast with P4P schemes, which “seek to influence provider behavior.” They have two components:

  1. a high-deductible health plan, typically with an annual deductible of at least $1,000 (but usually paying for a range of preventive services before the deductible is reached),
  2. a personal health financing account: either a health savings account (HSA, owned by the employee, and to which both employee and employer can make tax-exempt contributions) or a health reimbursement arrangement (HRA, owned by the employer, who is the sole contributor to the account).

Skills industry-bound O.R. MS/PhD grads should have

Graduation will soon be upon us, so it seems as good a time as any to evaluate whether the skills we (university professors in operations research, aka O.R.) teach our graduate students match the skills industry practitioners want from their new hires.

I made a Wordle out of the required/recommended qualifications of 30 industry jobs on the INFORMS OR/MS Classifieds page. I had to clean the data a little since poor Wordle gave a lot of importance to words like "using" or "highly" otherwise, and I removed words like "training", "skills" and "experience", which don't add anything to the Wordle since their meaning depends on context. I could have cleaned the data more, but I didn't have time to make the Wordle look even better.

So, here it comes!

 

From this we learn that communication and team(work) are really, really important to employers - so a proven track record in communicating effectively and working well in teams could make the difference for new O.R. grads with advanced degrees!

The ability to analyze data using statistical techniques also seems much on employers' mind. Note that languages really refer to computing languages but computing was one of those words that I removed because they overshadowed the rest. (If you're surprised O.R. involves computing, hopefully you're not one of our graduates.) Employers also naturally care a lot about optimization - improving their strategy for the future, instead of only giving a detailed picture of the past.

Other words that pop up often (although they're a bit over-shadowed in the picture) are Excel, CPLEX, Java, SAS, C++ and SQL. Regression, time-series, SPSS also make an appearance.

And if you're a student but aren't yet graduating, you know what to work on!


"Algorithmic Prediction of Health-Care Costs"

Today's post will summarize a paper published in 2008 in Operations Research and co-authored by my former PhD advisor, Dimitris Bertsimas, along with six other researchers. The full citation for this paper is: Bertsimas D, Bjarnadottir M, Kane M, Kryder JC, Pandey R, Vempala S and Wang G (2008), Algorithmic Prediction of Health-Care Costs, Operations Research, 56(6): 1382-1392.

The paper demonstrates how modern data-mining methods, in particular classification trees and clustering algorithms, can be used to predict health care costs of a given year based on medical and cost data from the previous two years. The method was validated using training data from over 800,000 insured individuals over three years, and its accuracy was checked on an additional testing data set of 200,000 (called "out-of-sample", since this data wasn't used to create the algorithm).

The authors state their conclusions as follows:

"(a) our data-mining methods provide accurate predictions of medical costs and represent a powerful tool for prediction of health-care costs,
(b) the pattern of past cost data is a strong predictor of future costs, and
(c) medical information only contributes to accurate prediction of medical costs of high-cost members."

The approach uses 1,523 variables (see Table 1 of the paper):

  • variables 1-218 are diagnosis groups and counts of claims with diagnosis codes from each group,
  • variables 219-398 are procedure groups,
  • variables 399-734 are drug groups,
  • variables 735-1,485 are medically defined risk factors,
  • variables 1,486-1,489 are counts of members' diagnosis, procedures, drugs and risk factors,
  • variables 1,490-1,521 are cost variables, including overall medical and pharmacy costs, acute indicator and monthly costs,
  • variables 1,522-1,523 are gender and age.

Figure 2 of the paper shows the cumulative health-care costs of the result period for members in the learning sample, with 70% of the total health-care costs being due to around 8% of the population.

Members' costs were partitioned into five different bands or buckets "to reduce noise in the data and at the same time reduce the effects of extremely expensive members". The buckets were chosen so that they would all approximately have the same total dollar amount (i.e., the sum of all members' cost in that bucket, which varies between $116 and $119 million, represents about 20% of the total costs - the authors describe each bucket as representing low, emerging, moderate, high and very high risk of medical complications, respectively.) Cost bucket information is provided in the authors' Table 2:

Cost range    % of learning sample     Number of members
<$3,200                      83.9%                204,420
$3,200-$8,000             9.7%                  23,606
$8,000-$18,000           4.2%                  10,261
$18,000-$50,000         1.7%                   4,179
>$50,000                     0.5%                   1,175

The authors argue that the error measure "R-squared" (R^2) is not appropriate for the problem at hand, and prefer using three other measurers:

  • the hit ratio: percentage of the members for which the authors forecast the correct cost bucket.
  • the penalty error: asymmetric to capture opportunities for medical interventions (greater penalty for underestimating higher costs, set to be twice the penalty for overestimating). In mathematical terms, if the forecast bucket is i and the actual bucket is j, the penalty is set to be max {2*(j-i), (i-j)}. The penalty table is provided in Table 3 of the paper.
  • the absolute prediction error: average absolute difference between the forecasted (yearly) dollar amount and the realized (yearly) dollar amount.

(They do include R^2, truncated R^2 and |R| in their performance measures to compare their results with published studies.)

The baseline method, i.e., benchmark against which the authors' methods are compared, uses the health-care costs of the last 12 months of the observation period as the forecast. Performance metrics for this method are shown in Table 6. The overall hit ratio is 80%, but steadily declines from 90.1% for Bucket 1 (low risk group) to 19.3% for Bucket 5 (very high risk group). The other two performance measures all worsen for higher cost buckets.

The two data-mining methods implemented by the authors are:

1. Classification trees, which "recursively partition the member population into smaller groups that are more and more uniform in terms of their known result period cost." Tables 8 and 9 show examples of member types that the classification tree algorithm predicts to be in bucket 5 and 4, respectively. For instance, are predicted to be in bucket 5 "members in cost bucket 2, with nonacute cost profile, and costs between $2,700 and $6,100 in the last 6 months of the observation period, and with either (a) coronary artery disease and hypertension receiving antihypertensive drugs or (b) has peripheral vascular disease and is not on medication for it."

2. Clustering, which "organize objects so that similar objects are together in a cluster and dissimilar objects belong to different clusters." The authors' method adapts "the algorithm behind EigenCluster, a search-and-cluster engine" developed in 2004, to the context of health-care costs. The approach is as follows: the authors "first cluster members together using only their monthly cost data, giving the later months of the observation period more weight than the first months... Then, for each cost-similar cluster, we run the algorithm on their medical data to create clusters whose members have both similar cost characteristics as well as medical conditions."

The resulting performance measures are shown in Table 11. Both data-mining procedures, which produce similar results to the benchmark for the bottom bucket (bucket 1) and outperform the benchmark in every single other instance, approximately double the hit ratio for the top bucket (bucket 5) compared to the benchmark and achieve an overall hit ratio of about 84% compared to the baseline method's 80%. This significant improvement also holds in terms of penalty error and APE. The clustering algorithm is a bit stronger in predicting high-cost members. The authors suggest this is because of "the hierarchical way cost and medical information is used."     


To New-Course or Not To New-Course? #NSFGrantProposals

I was chatting with a friend of mine via Skype the other day, and she mentioned that she was preparing a NSF CAREER proposal. One thing we talked about was the broader impact requirement, and in particular the fact that just about everybody seems to say they're going to create a new course.

In her previous try (and she doesn't apply to my directorate, so don't try to figure out who it is), she'd written she'd incorporate the results of her research into an existing course, and a reviewer had apparently taken issue with the fact that she wouldn't create a new course.

And we were wondering (i) how many researchers who get those awards and have said in their proposal that they were going to create new courses actually do so (she had someone in mind...), and (ii) whether it really helps the wide dissemination of the research to create a new course. Doesn't it make more sense to incorporate results into an existing course with already established enrollment, which will reach more students and is more likely to be offered in the long term?

I wonder how many new courses based on the NSF-funded research of one faculty member have consistently high enrollment. Will the students of other advisers really care about taking that new course based on research they haven't had a hand in shaping, when they hopefully find their own research more interesting and more valuable? (It's going to be a long six years for them otherwise.) If only the PI's own students care to take the course, then there is no point in pretending the work is being disseminated any more widely than through regular research meetings. [PI=Principal Investigator]

I'm not saying that creating new courses is always a bad thing. I'm saying, however, that creating new courses should not be the automatic answer to the NSF's Broader Impacts requirement, and a case should be made that a new course will attract students beyond the instructor's immediate research group in a sustainable manner.

If it doesn't, then really the researcher's tool for broader impact is really sending his or her doctoral students into the workforce after graduation and let them shine (which is an excellent method, as a matter of fact. It also happens to be my method of choice, although I do like to blog a lot.)

This entire discussion also assumes that incorporating research into doctoral-level teaching materials, whether through new courses or existing ones, is best to foster wide dissemination. I would also love to see novel research results trickle down to Master's level courses and perhaps senior electives, although of course they couldn't be the whole course.

Staying a bit longer with the idea of doctoral-level teaching as broader impact, implicit is the "push" approach to dissemination: students equipped with new tools push the knowledge in the real world once they graduate. But perhaps its cousin, "pull", should be preferred.

In the "pull" model, industry practitioners are made aware of the new tools through other means than the knowledge of a new hire (and surely the NSF expects more creative means than publishing papers in academic journals), and then insist that their employees implement these tools to gain an advantage over their competitors. After all, a new hire may have great, novel research tools at her disposal but they will only have an impact if her boss cares to have her use them.

I could write about this all day, but going back to the more manageable issue of achieving a broader impact through teaching: do you think creating a new course is best (if you were asked to evaluate standard NSF grant proposals of 3-year duration or longer) or do you favor incorporating results into an existing course?


Robert Merton on Innovation Risk

Hbr-april2013 Nobel Laureate in Economics and MIT professor Robert Merton has written an excellent article on Innovation Risk in the April issue of Harvard Business Review. Here’s one of the most valuable excerpts, which gives a good high-level overview of the whole article: “Some models turn out to be fundamentally flawed and should be jettisoned, while others can be improved upon. Some models are suited only to certain applications; some require sophisticated users to produce good results. And even when people use appropriate models to make choices… it is almost impossible to predict how their changed behavior will influence the riskiness of other choices and behaviors they or others make.” If you only care to remember three sentences about decision-making, make it those three. You’ll already be ahead of the pack.

I also liked Merton’s brief discussion of the Black-Scholes formula in option pricing, which he helped develop, the 2007-2009 financial crisis, which he mentions to illustrate unintended consequences, and his use of pi to make his point about models: for instance, using a value of 4.14 for pi is clearly wrong (although incorrectness may be difficult to spot), while a value of 3.14 or 22/7 is simply incomplete, but may be appropriate in applications such as high school exercises.

He also makes a valuable aside in a side box about the credit-rating debacle, a good example of “how adopting a model not fit for your purpose – in this case, using a model for predicting the likelihood of default rather than one for valuing bonds to manage the portfolio – can result in disastrous decisions.” Another side box focuses on systemic risk.

Merton’s framework can be summarized in the following five steps:

  1. Recognize that you need a model for making judgments about risk and return,
  2. Acknowledge your model’s limitations,
  3. Expect the unexpected,
  4. Understand use and user (“A model’s utility depends not just on the model itself but on who is using it and what they are using it for… A model is also unreliable if the person using it doesn’t understand it or its limitations.”),
  5. Check the infrastructure, i.e., the environment into which an innovation is introduced.

This will without a doubt emerge as one of the most important HBR articles of the year.