I often get questions on how I teach business analytics, so I figured it'd be as good a blog topic as any, especially since I had a great conversation this morning with a graduate student who took that class her first semester and now has included me on her dissertation committee. The analytics course I teach (or taught until last year, since I've been teaching something else this year) is an overview to analytics and has no pre-requisite. It also has an undergraduate section (senior elective) and a graduate section. While the undergraduate version of the course is taken by students in our major, who are therefore well prepared by that time in their studies, the graduate version of the course is open to any graduate student in the school of engineering, which creates a much wider distribution of backgrounds and preparedness. Oh, and there is a distance (graduate) section too, and before the pandemic it was asynchronous, so ideally you have to make the course engaging for asynchronous viewing.
I start by providing an introduction to the analytics cycle, also known as CRISP-DM (Cross-Industry Standard Process for Data Mining) and go over select articles from Harvard Business Review, especially Competing on Analytics by Thomas Davenport, which got the analytics field started back in 2006 (fifteen years ago!) and the spin-off articles he has co-authored, such as Data Scientist: The Sexiest Job of the 21st Century, from 2012. If I teach the course again, I will probably update the readings with one of his most recent articles such as 4 Ways to Democratize Data Science in Your Organization, from 2021. Also I think it is good for students or recent grads to read publications that the people they aspire at becoming read, and Harvard Business Review is definitely high on that list, although it is a little expensive if they can't get a relative to buy them a subscription as graduation gift (although perhaps HBR gives students and young alumni special rates).
After the high-level introduction to data science, I usually dive into R and cover linear regression, logistic regressions, classification and regression trees (including random forests for the graduate students) before I take a little break from all the math and cover visualization in Tableau and R's ggplot and then return to clustering (hierarchical and k-means) before finishing with linear, integer and (my favorite topic) nonlinear optimization. I do tell the students that in practice visualization or descriptive analytics is done first, but in the past I've liked putting it halfway through the semester because it gives the students a nice little break at a time where they often have midterms. Maybe if I teach the course again I will just cover the topics in the normal sequence (descriptive analytics first, then predictive, then prescriptive) because it is closer to the real data science process. (Interestingly enough, I started teaching Tableau because a student of mine at Lehigh had just gone back from a summer internship at a consulting company where they had learned Tableau and told me I should teach it to my next cohort of students. Always keep in touch with alumni of your courses! They may have great information to share with you.)
My main book is The Analytics Edge, by my PhD advisor Dimitris Bertsimas along with his co-authors Allison O'Hair and William Pulleyblank. What I like even more than the book is the "Analytics Edge" course website on edX,org, which was instrumental in helping me understand how to pace a course that basically everyone can enroll in. While the book itself consists mostly of case studies without code, the appendix has a fantastic introduction to R for beginners and the book itself also has several good assignments for each chapter, solved in the instructor's manual. Sadly, I have taught the course for so many semesters that I am reluctant at using those assignments again, due to previous students' answers floating around the Internet.
So what I do instead is to use CSV datasets from kaggle.com. In an ideal world, we would have the data in databases and we would extract it to CSV in SQL queries so the students would get to see the entire process, but the semester has only so many weeks, and it's unrealistic to try to do too much. So what I do is I go over the datasets that have received many up-votes and look for something that would illustrate well whatever topic I covered. (I change the datasets every semester, because of websites like coursehero.com where students like to dump their assignments for the next crop of students to look at.) Kaggle.com is such a wonderful thing to happen to data science, just like GitHub.com . I also use a dataset from kaggle.com for the final project.
Somewhere along the way, I ask students to write an essay on ethics in data science, using a list of readings I provide to show them the dangers of data science being misused, and then also asking them to add a few more papers on an angle that is particularly interesting to them. This is where the double majors try, when appropriate, to tie the different parts of their curriculum together. I vividly remember a double major in management science and philosophy who was just so happy to use his entire training to write a particularly meaningful essay.
Now the Analytics Edge book provides a great introduction to R but it is only an introduction, and to meet the needs of all the students in the class including the more advanced ones, a few semesters ago I started also using the R for Marketing Research and Analytics textbook by Chris Chapman and Elea McDonnell Feit. Elea was once the chair of an Analytics conference while I was on the conference committee, which is how I got to know her and learn about her book, and if you don't have her book, you should get it. The book website is tremendously useful with lots of codes and slides for all the chapters. What I like about the book and the related code is that it introduces students to more advanced features and packages in R, so students who are further away on the learning curve when the semester starts still get to extract something out of the course. Many of the advanced students have also told me that the course, in the way I teach it, provides a nice summary of multiple dimensions of management science, so it is a nice way to wrap up one's graduate degree too.
Somewhere toward the end of the course, I also give a test (I don't have a final exam, only a final project, but I do have one exam in the course). What always amazes me is that you can distinguish most easily the students who are at the top from those who are not from the optimization (or prescriptive analytics) questions. Most students can learn the basics on predictive analytics, so they know, say, you put about 70% of the data in the training set and the rest in the testing set, or you use linear regression to predict continuous variables and logistic regression to predict 0-1 variables. But the weaker students have tremendous difficulty in modeling optimization problems correctly, and I think creating good optimization models from data will be the next frontier that will distinguish the top data scientists from the others, especially given the fact that few Master's programs in data science have time to cover optimization in any depth, due to the amount of material they have to teach the students in only a few semesters.
At the end of the course, I try to give one lecture on big data. This way, if the students have to interact with employees who specialize in big data, they will have some idea of how it is done. (The main course objective is to give the students the background to contribute meaningfully to such conversations, rather than turning them into data scientists overnight, which no single course can do.)
A final piece of advice I give my students, and if you have read thus far (let me know in the comments or @ me or DM on Twitter) you deserve to have it too, is that if you are interested in this field it is good practice to check out job offerings periodically for the required and recommended competences. This is in fact a piece of advice I give in the context of career management. Once a student has a job, it is useful to remain competitive on the job market by keeping track of what the companies are asking for. This may help students identify gaps or software they had never heard of that they would benefit to learn asap so that their skills are in top shape. So when they see in the list of required skills something they are not good at or have never heard of, they need to develop a strategy on how they can become better at it, so that they can be given a chance if the opportunity arises. I'll try to write a blog post on a couple of job ads as an example, when I am done grading all my end-of-semester papers!