Quantitative metrics and the Vietnam War

VietnamWarI recently saw all 10 episodes of the PBS documentary The Vietnam War by Ken Burns and Lynn Novick, a superb documentary for the most part, and for today's post I wanted to talk about Robert McNamara's misguided emphasis on quantitative metrics to determine whether the war was being won. In particular, McNamara focuses on a single metric, the kill ratio (how many U.S. troops killed vs how many North Vietnamese troops). This led U.S. troops to classify civilian victims as enemy combatants and inflate their numbers, giving the impression the war was being won. So the first issue was the definition of measures, which led to distortion in their computation.

Also, in the documentary someone makes the excellent comment that it didn't matter to the American public if the kill ratio was 10:1 (10 North Vietnamese soldiers killed for each American soldier), because all that Americans cared about was that one U.S. soldier who got killed. At issue then was also the interpretation of measures. 

In academia it is easy to come up with examples of flawed metrics. For instance, using only GPA to assess a student's performance in college may result in grade inflation if an instructor doesn't want to deal with students arguing for a better grade. Ranking students, maybe not with a precise rank to avoid cutthroat competition but in terms of percentile, as is done at the GRE exam (and in college entrance exams in France and Turkey, for instance), could provide good additional information. That is also a purpose that graduation "summa cum laude", "magna cum laude" and "cum laude" could serve, although the requirements for each vary from school to school.

Relying on teaching evaluations to assess teaching can encourage being overly lenient so that students will give good grades to the instructor. I remember in high school in the French system our teachers were assessed through a visit from an official from what would be a cross between the U.S. school district and the state. (These visits didn't happen very often but it was a big deal when they did.) Maybe having peer evaluations of one's teaching in addition to teaching evaluations would help gain a more accurate picture of someone's teaching ability.

Of course, there is also the issue of quantifying research output through a researcher's number of publications. A recent article in the New York Times discussed the situation in China, which "has retracted [since 2012] more scientific papers because of faked peer reviews than all other countries and territories put together." And new journals keep proliferating to serve as outlets for more academics' work. Then you need to rank those journals, typically through a metric called impact factor, and then journal editors try to boost their impact factor by asking authors to cite other papers published in the journal or commissioning review papers. Authors themselves may sometimes ask authors of papers they are refereeing anonymously to add references about their papers. (Those cases are relatively easy to spot, either there is only one reference with only tenuous link to the paper being reviewed, or there are several references to be added that have one author in common.)

If you give people a metric, they will try to use it to their advantage. In the case of the Vietnam War, the consequences were disastrous.

The army has come a long way since those days and tried to learn from its mistakes, as shown in this RAND report on assessment and metrics in counterinsurgency. Its author argues that centralized assessment fails in counterinsurgency, and should be replaced by a bottom-up contextual assessment process. I've been thinking about how this can translate into evaluating universities. I guess the equivalent would be that each department has to evaluate itself and then pass along its assessment up in the hierarchy. But how would the results be validated and communicated to the outside public? How to incentivize being truthful in the never-ending quest for resources? Wars have a clear result at the end. It is much harder to tell whether a university was successful in rising above another one. Yet, we all continue to keep an eye on the rankings of U.S. News & World Report while pretending we don't care. 

My other thought was: what would (above-board, PG-rated) guerrilla warfare look like in the context of universities vying for better national spots? Would that relate to Clayton Christensen's disruptive innovation, with the up-and-coming university staying under the radar until it has siphoned many potential applicants, whether faculty or student, from the incumbent? 

"The Skill that Industry Hires Need"

According to this post in Science magazine (geared toward PhD students and academics in science), industry employers particularly value project management skills in new hires, "including working in a team and delivering on schedule and on budget". I found this particularly striking because project management is perhaps the least used skill in graduate school. There is no timeline in submitting a research paper or getting a degree. Some doctoral students take five years to graduate; other take eight. There is little concept of a schedule to be kept, research-wise. In a way, not only are graduate students not taught project management, they are taught the opposite: it takes the time that it takes; what matters is the end result. No wonder then that graduate students with industry internships have an advantage over the competition when they seek industry positions.

(As a side note, this got me thinking about the skill that analytics students need the most, since I teach analytics rather than science. Of course project management is important for analytics students too, but based on my experience, students struggle the most with the idea that there might not be a single best model to be created from their data. You can create, say, a linear regression keeping only the coefficients that are significant at the 95% level, or you can focus on the 99% level, or you can have categorical variables, with some labels being very significant and others not as much, or you can have a logistic regression model and a classification tree model to predict a binary outcome, and each has its advantages and disadvantages. Students are disappointed sometimes when it feels like there isn't a unique best answer. But you create your models based on the data you have, and the process is necessarily imperfect. You can still get good insights from your model. )

Going back to the theme of this post, I think that undergraduate students learn more about project management through their capstone project at the end of their studies than doctoral students do. It makes sense, given that most undergrads go on to industry positions right after graduation (only a few get a Master's degree before starting work), but it is time to recognize the changed job prospects for PhDs too. Could we bring project management to academic research itself? Grant proposals ask us principal investigators to do as much, with budget justifications, deliverables and intermediate milestones, but graduate students are rarely involved in defining those. Maybe universities should provide more training on those matters.

Or maybe this could motivate a stronger emphasis on doctorate programs with time-constrained "praxis" capstone projects rather than dissertations, such as D.Eng. rather than PhDs. Perhaps it is even time for a renaissance of doctoral students that aren't PhDs in order to better meet industry needs, or the creation of an intermediary degree between Master's and PhDs. When I was at MIT, my department (Electrical Engineering and Computer Science) had a degree of Electrical Engineer, which was aimed at doctoral students who had completed all coursework in the PhD program: the All But Dissertation folks. Obviously most A.B.D.s don't plan on working in academia and maybe an advanced degree geared toward industry would be better suited for their career goals. This raises the issue of degree visibility and name recognition, if only a handful of universities deliver the new degree, but given today's pace of change, it'd make sense to introduce new degrees more suited to the needs of the workforce.

We could even imagine a system where students get credentials for each year of graduate study (or some number of credits to account for part-time students), with "Graduate Credential Level 1" being received at the end of the first year (maybe similarly to a Master of Engineering), "Level 2" at the end of the second year (equivalent to a Master of Science), and then adding "Level 3", "Level 4" etc, with the student being able to stop for a few years in-between if he so wishes. There is a lot of talk on campuses these days about continuing education, but it is unrealistic to expect these trends will fit neatly within existing degree programs. It is time for new graduate degrees.

INFORMS Wagner Prize: Call for Entries

Each year INFORMS grants several prestigious institute-wide prizes and awards for meritorious achievement. The Daniel H. Wagner Prize for Excellence in Operations Research Practice emphasizes the quality and coherence of the analysis used in practice. Dr. Wagner strove for strong mathematics applied to practical problems, supported by clear and intelligible writing. This prize recognizes those principles by emphasizing good writing, strong analytical content, and verifiable practice successes.

Applicants to this prestigious award have come from a variety of areas, such as: Health Care, Logistics, Supply Chain, Political Districting, Manufacturing, Cancer Therapeutics, Machine Learning, etc.

The deadline for submitting a 2-page abstract for the Wagner Prize is: May 1, 2017.

More information here.

From the Economist

 Some interesting articles from a recent issue of The Economist:

  • Superstition ain't the way, about China's data: about the apparent unreliability of certain figures and the search for more reliable indicators. Some quotes: "Provincial GDP figures do not add up to the national total. Quarterly and annual growth do not always mesh... Growth has not dipped below 6.7%, even as prices slipped into deflation in late 2015... As far back as 2000, scholars turned to indicators like electricity consumption as a statistical refuge from what one called the "wind of falsification and embellishment" rustling the official data. But electricity is a less reliable guide as an economy evolves away from power-hungry industry toward low-wattage services." The article discusses the combination of rail freight, electricity and bank lending to gauge the national index, an index called the "Li Keqiang index" in honor of the China's premier who suggested it. The Federal Reserve Bank of San Francisco has improved the index by fitting it to GDP figures from 2000-2009 but, while its predictive power remained strong until 2012, it decreased afterward, perhaps because of the boom in financial services that recently boosted China's GDP. The article also describes other attempts at constructing more insightful indices with more data. Goldman Sachs has even combined 89 products. An issue remains to convert those output figures into monetary data. Official GDP figures, though, may be improving thanks to technology advances.
  • Called to account: The disturbing prosecution of Greece's chief statistician. This is a must-read, about the high price Greece's chief statistician, Andreas Georgiou, who also has 21 years experience at the IMF, is paying for trying to keep the numbers honest. His crime has been to estimate that the government's budget deficit in 2009 was 15.4% of GDP, which has been confirmed by the European Commission as accurate. His detractors blame him for the panic that led to Greece's bail-out in 2010 and for harsh conditions imposed by Greece's creditors. The Economist writes: "Courts have rejected these charges three times. But on August 1st the Greek supreme court reopened the case." Georgiou is also facing separate charges for "refusing to allow ELSTAT's board to use a vote to decide on the level of the deficit. Statistics are not supposed to work by ballot."
  • Fashion forward, about Zalando, which sells shoes and clothes online in Europe. (This business model is old-news in the United States but not in Europe.) Part of its success can be attributed to the close attention it pays on data. It also targets higher-value, brand-conscious shoppers, while Amazon targets more price-conscious customers, so hopes to retain the lead in the market even as Amazon and Alibaba builds their offerings in Europe.
  • Leaving for the city: a Schumpeter's column about how lots of prominent American companies are moving downtown. The main example is that of General Electric, who is moving from Fairfield, CT to the Boston waterfront in the new innovation district and apparently won't have a car park to encourage people to take public transportation. And since I saw the innovation district at length when I was back at MIT for my sabbatical two years ago, I can only say this is the sort of ideas that look good on paper and will distress employees when they see what the Silver Line is like. On the other hand, this might have the unexpected side effect of encouraging employees to get home in time for dinner and spend time with their families since they're not going to want to take the Silver Line late. Taking public transportation to work sounds appealing when you're in your 20s and live in an apartment in the city a few bus stops away from your workplace. If you want your kids to go to a top school district like Brookline or Newton, the commute is going to seem less appealing. And I mention this about Boston but it is true of other cities as well. My guess is that at some point the tax incentives are going to lose their luster. The Schumpeter columnist recommends reading "The Big Sort" by Bill Bishop and explains: "It argues that Americans are increasingly clustering in distinct areas on the basis of their jobs and social values. The headquarters revolution is yet another iteration of the sorting process that the book describes, as companies allocate elite jobs to the cities and routine jobs to the provinces." But Fairfield, CT was never a backwater. Additionally, the part of the column that distinguishes between mass headquarters in sunbelt cities and executive-headquarters in elite cities was quite interesting. For a good view of the future, read the part of the column about San Francisco. 

Best presentation at the #Analytics16 conference...

...(at least among the ones I've seen) was by Dr. Tim Niznik of American Airlines who gave an outstanding talk on hub disruption management. He had the last time slot on Tuesday, which is always a difficult time slot since so many people are leaving to catch an evening flight, but his talk was extremely well attended. His presentation was very engaging with a mix of visuals about weather maps and tools such as the diversion tracker and the gate demand chart. The goal is to figure out how to strategically delay some flights in order to minimize excess gate demand, minimize operations beyond airport closure time, minimize system-wide passenger impact and minimize delay introduced without violating crew/curfew rules. This talk was so good I hope Dr. Niznik can be one of the plenary or semi-plenary speakers at an INFORMS conference very soon. The high quality of his team's work deserves the visibility and dissemination to as large an audience as possible (and I don't even fly American). If you're a conference attendee, you can see his slides by logging into INFORMS Connect, clicking on My Communities and entering the Analytics 2016 community and browsing through the latest shared slides on the bottom left - he was part of the Tuesday Decision and Risk Analysis track.

The Analytics16 conference was one of the very best conferences I attended in recent memory and I'd like to thank Elea Feit for doing such an amazing job chairing the organizing committee (I know because I was part of the committee), as well as all my colleagues who helped put together such a remarkable event. The bar is set high for next year. The conference will be held in Las Vegas, NV, April 2-4, 2017. Mark your calendars!

Innovative Applications in Analytics Finalist: Detecting Preclinical Cognitive Change

This morning I attended a great talk as part of the "Innovative Applications in Analytics Finalist" track: Detecting preclinical cognitive change by Dr. Randall Davis and Dr. Cynthia Rudin of MIT. With the increased prevalence of dementia and Alzheimer's among the elderly, the associated health care expenses and the heart-wrenching situation of relatives who, when the disease is in an advanced stage, are no longer recognized by a dear parent, it is critical to diagnose cognitive decline as soon as one can so that early action can be taken and people can enjoy as much time as they can with a dementia-afflicted relative while this person is still himself or herself. Over 5m people have been diagnosed with Alzheimer's in the U.S. and the healthcare costs could soon be in the billions of dollars. The aging of the population also means that early diagnosis of dementia has emerged as one of the most pressing healthcare issues of our time. (The approach is applicable to other conditions such as sleep apnea.)

The talk showed how the classical Clock-Drawing test can be leveraged using new tools and technology to gain more information on a patient's cognitive state. There are in fact two clocks: the command clock (the patient is ordered to draw a clock showing a time of ten past eleven) and the copy clock (the patient is shown an image of a clock showing a time of ten past eleven and has to reproduce it). The key is to analyze the process of drawing the clock and not just the final result. The team of Dr. Davis, Dr. Rudin and their coauthors has been able to do that using a specially designed pen (equipped with a camera) and a special paper (which lets the pen know where it is on the piece of paper.) They call their test the digital clock drawing test. This allows them to measure key metrics such as the time it takes for the patient to draw the first hand of the clock after he or she has drawn the clock face and placed the numbers. It turns out that the pre-first hand latency - the time it takes for the patient to figure out where to draw that first hand of the clock - can help distinguish Alzheimer's from depression. Total Thinking Time is also an important metric, as was the "disappearing hooklet" on the first 1 of "11" in the numbers. (Basically when you are done drawing the first 1, you already think about drawing the second 1 starting from top to bottom so there should be a small hook at the bottom of the first 1, pointing toward the top of the second 1. A disappearing hooklet is one of the first signs of cognitive decline.)

In addition, the final result to characterize the patient-drawn clocks has traditionally been scored by physicians in widely different ways based on the distortion of the clock face, incorrect placement of the hands of the clock, and so on. The talk's authors showed convincingly how cutting-edge machine learning algorithms such as Supersparse Linear Integer Models (SLIMs) and Bayesian Rule Lists (BRLs) could be implemented to create decision rules that resembled the operational guidelines of physician-created scoring rules. This is important because it increases transparency and makes it more likely that physicians will implement those new methods because they are close to models they know. Physician-generated scoring systems achieved AUC (area under receiver operating characteristics curve) in the range of 0.66 to 0.79 where 0.5 is random and 1.0 is perfectly predicted. Machine-learning with all features achieved an AUC of 0.93 but is not as intuitive as the traditional physician-generated scoring systems. Machine-learning models based on SLIMs or BRLs achieve a tradeoff between those extremes with AUC of the order of 0.8, improving traditional physician-driven models but retaining high interpretability. As such, they are "Centaurs", or human-machine combinations that are better than either, applied to solving one of the greatest healthcare challenges of our time.

Digital Cognition Technologies, Inc. is marketing the technology, now pending FDA approval.

Read more about this research here (news release), here (papers of the MIT CSAIL Multimodal Understanding Group) and here (Dr. Rudin's papers). Specifically, you can read the paper that accompanies the Innovations in Analytics Award entry here: "Learning Classification Models of Cognitive Conditions from Subtle Behaviors in the Digital Clock Drawing Test." Fascinating stuff!

Monday's poster session

Analyticsposter2016I was very impressed by the poster session at Analytics16 yesterday, both in terms of the quality of the posters and the number of people who stopped by to ask me questions about my work. I was presenting my and Dr. Ruken Duzgun's research on multi-range robust optimization, with a focus on a case study we did comparing two-range robust optimization (2R-RO) with stochastic programming (SP). While traditional RO of the Bertsimas & Sim variety ends up only considering the nominal and worst-case value of each coefficient at optimality, multi-range RO can incorporate more than 2 scenarios (2R-RO can have up to 4 scenarios, for instance) and thus offers a bridge between traditional RO and SP. Our approach solves within seconds while SP hits the time limit of 1 day. You can read our papers here and here. Thanks to Sudharshana Srinivasan for taking my picture!

Richard E. Rosenthal Early Career Connection Program

The Richard E. Rosenthal Early Career Connection (RER ECC) on Sunday was a great success! RER ECC participants mingled with conference attendees selected for their shared expertise and record of contributions to INFORMS. I particularly want to thank Elea Feit, Mike Trick and Robin Lougee for taking the time to share their insights with the RER ECC participants (with apologies to anyone I forget). The program targets young professionals only a few years into the workforce. I was very impressed by the record of accomplishments of this year's cohort and their ability to potentially implement large-scale analytics at companies like General Motors or Air Liquide. All of them were extremely articulate in addition to exceptionally talented and I am sure we will hear from them in the future for their operations research accomplishments. I am including the slide my co-chairperson Tarun Mohan Lal of Mayo Clinic prepared for the introduction of the RER ECC participants during one of the Analytics16 keynote addresses, which contains their name and picture (make sure to congratulate them on their selection to RER ECC if you see them) and a picture of the reception taken by RER ECC ' 15 alumna Sudharshana Srinivasan, who was instrumental this year in helping Tarun and me deliver a high-quality Richard E. Rosenthal Early Career Connection program.


ECC research at INFORMS Annual Meeting

Today's post is about the presentations two participants to the Richard E. Rosenthal Early Career Connection program - Shokoufeh Mirzaei and Ehsan Salari - did of their work at the 2015 INFORMS Annual Meeting in Philadelphia, PA. (Nominations for the 2016 ECC program are now open and due March 4! More information is available here.) 

IMG_3213Dr. Mirzaei (Shokoufeh thereafter), who is a tenure-track Assistant Professor at Cal State Pomona, gave a talk regarding open problems on computational structural biology and protein quality assessment. She explained why researchers care about the structure of the protein (shape affects function). Many diseases happen as a result of misfolding proteins (for instance Alzheimer's and Parkinson's) and it is important in the drug discovery process to understand a protein's interaction with other proteins. The goal of this line of work is to design new proteins with desired functions not currently found in nature, with the hope that computational work will be able to replace at least in part experimental work, i.e., drug trials on individuals. A challenge is that there is no clearly defined energy function so there is no clear objective to minimize. Criteria that can be used in the optimization framework include: hydrogen bonds, Van der Waals interactions, backbone and angle preferences, electrostatic interactions, and more. Those criteria lead to very nonlinear and non convex expressions, making the problem even more challenging. 

Shokoufeh then discussed the Protein Data Bank (which offers opportunities for template-based, homology-based and free modeling) and the WeFold Coopetition, the purpose of which is to encourage "coopetition" (competition+cooperation) among labs to improve the state of knowledge regarding protein structure prediction. Certain classes of prediction targets have only seen modest gains over the past few years and such an event therefore had the potential of speeding up the rate of discovery. Open problems in the field include best scoring function and best metrics to compare two protein structures. Then Shokoufeh commented on the problem of creating a benchmark data set for testing different proteins and discussed computational approaches such as MESHI (using the clustering nature of proteins) and Support Vector Machine.  

This is a field I know nothing about and I was struck by the clarity of Dr. Mirzaei's presentation as well as the effectiveness of her communication skills in making very complex problems understandable to a lay audience. She proved a very articulate and effective speaker who convincingly made the case for her research. In today's world, analytics professionals must not only have the quantitative tools to make a difference but also communicate their work effectively and there is no doubt Dr. Mirzaei will soon be a star in her domain. 

IMG_3218Dr. Salari (Ehsan thereafter), Assistant Professor at Wichita State, gave a talk entitled: "Biologically-guided radiotherapy planning: fractionation decision in the presence of chemoradiotherapeutic drugs." His talk was based on his recent paper published in IIE Transactions in Healthcare Systems Engineering. You can find a technical paper version of the work here. Radiotherapy (RT) uses high-energy radiation beams to kill cancer cells by damaging DNA. The survival rate of cells when exposed to different rates of radiation follows a linear-quadratic model. RT treatments are delivered through daily fractions. A regimen is thus determined by the number of fractions and the radiation dose. Effects of fractionation are accounted for by using a concept called the biologically effective dose. (BED)

Chemotherapy can be done sequentially or concurrently with RT but increases the risks of complication. It is therefore important to determine the impact of chemotherapeutic agents on optimal RT fractionation regimens. Additivity and radio-sensitization both affect the linear-quadratic curve depicting the survival rate of cells. Ehsan proposed an approach to extend the BED model to quantify the radiation damage and studied the optimal radiation fractionation regimen and the drug administration scheme under 4 schemes: RT only, CRT with additive effects only, CRT with radio-sensitization effects only, CRT with combined effects. His presentation contained many insightful graphs on the structure of the optimal regimen, which you can also find in his paper

Like Shokoufeh, Ehsan proved to be an exceptional researcher delivering a compelling, insightful presentation of quality far above the average presentation at the annual meeting. I am looking forward to reading other papers by him.