I finally got around to reading the June 2018 issue of OR/MS Today, with its catchy headline "Is 'fake' data science a threat?" Can anyone say no to that? The cover admits sheepishly: "Yes. How should the O.R. profession respond to 'charlatans'?" When I read the article itself, I was surprised to note that it seemed directed toward data science itself as a potential threat to operations research and makes sweeping statements such as "O.R. has always focused on decisions or actions", "data science often stops at having found interesting phenomena", "'charlatans' claim to be doing traditional statistics with little evidence for their claims" and "data science appears to be the latest and possibly most threatening manifestation of a desire by people from other disciplines to get the benefit of OR/MS and applied probability without having to learn much mathematical statistics." More reasonable is the comment that practitioners should have to "consider the theoretical justifications to [their] calculations" and know how to "check whether the assumptions underlying the analysis are defensible."
The article attempts to tackle two different issues at once: (1) the greater name recognition of data science compared to operations research, and (2) the fact that with the hype for data science has come the danger that under-qualified people will get hired as data scientists and end up doing more harm than good in their organizations. The name "operations research" has come to mean much more than research on operations for, say, the military facing logistical challenges during World War II, encompassing research from finance to health care. This has made it hard to communicate what we do to a lay audience. On the other hand, "operations research" sounds obscure enough that it conveys easily the difficulty of the techniques involved. These days, everyone generates data through Internet browsing and smartphone use, and uses data, from Amazon.com ratings to sports statistics, so data science seems familiar enough, for better or for worse.
As interest for analytics and data science has grown after the 2011 McKinsey report on Big Data (with a 2016 followup), universities have launched analytics programs by the dozens where they try to cram as much information as possible on descriptive, predictive and prescriptive analytics in the meager 30 credit hours they have to transform students into analytics experts. This puts pressure on faculty to cover a lot of topics fast. I don't think anyone working in data science today wants to view himself as a charlatan, and I would rather stay away from the name-calling. But what to do with students who may be weak in certain areas of analytics or have forgotten the assumptions underlying the models? Or consultants who try to generate follow-up projects at any cost?
It might be hard for academics to admit it, but a lot of people in industry aren't really after mathematical rigor. Academics are after mathematical rigor because that shows good teaching and because rigor is necessary to get their papers published. Practitioners are after improving their current business situation, and the mathematical rigor is helpful to generate buy-in for whatever idea they come up with. If someone has an insight that saves money, practitioners may well implement it, even if, say, the assumptions for linear regression weren't satisfied. This is very upsetting for academics to think about, and hopefully nowadays we have enough success stories about the proper use of advanced analytics at Amazon.com or Uber to motivate more practitioners to rigorously implement data science. But maybe some of the people viewed as charlatans are, in some circumstances, pragmatists and realists. (Not all, though. Some might well be incompetent. See paragraph after next.)
Perhaps the best solution is to give other graduates the tools to recognize people whose understanding of data science is very superficial, and in that sense it is beneficial to graduate as many would-be data scientists as possible so that they can challenge co-workers' assumptions and methodologies. About six or seven years ago I interacted with a company that was trying to predict college loan defaults and the person who had created the model had used a linear regression to predict 0/1 variables. In those days many students only ever saw linear regression as a predictive analytics technique. Nowadays, I like to think more students are familiar with logistic regression, and soon that will mean more managers too. This is an example of situation where it is best if managers have the same technical background as the employees they supervise.
We also should recognize how the rise of social media platforms such as LinkedIn and Twitter have pushed some individuals to try to brand themselves a certain way without having the formal qualifications to do what they say they do in any sort of depth. Which is not to say that they are not necessarily capable of doing it, since some might have learned from studying books by themselves. Just that, given the sheer number of people who state they do analytics but have no formal, rigorous qualifications in their resume, the odds are that many of them either don't know what they're doing or are sticking to simple tasks that don't nearly represent the expertise of data scientists.
But I also think we in academia have to pause and ask ourselves about the role we collectively might have played in creating this situation. Everyone wants to jump on the analytics bandwagon. Business schools create programs of varying quality on "business analytics", which might well be, in some programs, analytics watered down so that the non-quantitative crowd can get some training in data science that is neither rooted in statistics departments nor in operations research departments. Universities want to answer the needs of the workforce but they also want to generate abundant tuition money. In data science in particular, it is tempting (as one continuing education program I know of plans to do) to hire many adjuncts practitioners in the field supervised by one faculty member to teach as many students as will be willing to enroll. This has the advantage of scale and keeps costs down. But if those adjunct practitioners have superficial knowledge of data science, such programs will simply graduate more practitioners with superficial knowledge.
So, what to do? The most advisable course of action for a company would be to hire enough PhDs in their data science teams (whether PhDs in math, statistics, operations research). This would allow a critical mass of PhDs who can interact with Master's graduates and identify flaws in their reasoning as well as offer mentoring. A Master's may be great in providing enough tools that the graduate can interface intelligently with more technical staff. But to know enough about all of data science and a lot about some specific area of it, a longer degree than the Master's might be needed.
Comments