Dr. Cynthia Barnhart gave the Spencer Schantz distinguished lecture at Lehigh last week. Or rather lectures, plural, because there were two of them: the first one on Thursday, December 4, was a technical talk geared toward the faculty and students in the Industrial and Systems Engineering department, while the second one, the following day, was intended for a broader audience. Both talks focused on the application of operations research to the airline industry, with different amounts of math in each.
Composite Variables.
(Severe geek alert ahead. If you don't care for math, wait for next post tomorrow. I'm serious.)
While Barnhart discussed a wide range of topics in the technical talk, from robust scheduling to operations recovery and dynamic scheduling (wouldn't it be helpful if airlines reassigned a flight to a different plane when an aircraft is delayed, rather than sticking stubbornly to its original plan?), the concept that most got my attention is the one of composite variables, which Barnhart has discussed many times before and yet never fails to amaze me, because it shows how much of an impact the initial modeling of a business problem can have on its tractability. This is especially true in logistics, where the large-scale nature of the problems encountered makes it imperative to design smarter algorithms.
A review of the branch-and-bound algorithm. The idea is that many logistics problems use integer variables (number of trucks, yes/no assignment decisions modeled using 0/1 variables, etc). Such problems are typically approached by first solving their linear relaxation (the problem where we forget that variables have to be integer, which is much easier to solve than the original problem), and then slowly reintroducing the fact that variables have to be integer.
For instance, if we find that the number of trucks used is equal to 3.2 in the linear relaxation, we can say that at optimality, either the number of trucks will be less than or equal to 3, or it will be greater than or equal to 4. We can say that because there is no integer number strictly between 3 and 4, so the optimal integer solution can't be in the part we have just removed. Then we solve these new linear problems again, look up which variables are not yet integer, and reiterate. This is called the branch-and-bound algorithm.
A good example is available on these OpenCourseWare notes of a course I TA'ed for [was a Teaching Assistant for] many years ago, starting at the bottom of p.3. (Since the variables are binary, they can only take values 0 or 1, so the branching is on the variable equal to 0 or the variable equal to 1.) If you keep adding constraints to a subproblem to prevent the variables from being non-integer, at some point you will arrive at an integer solution and that subproblem will be solved. You will also have a feasible integer solution to the original problem.
What is appealing about branch-and-bound is that, once you have a feasible integer solution, you can sometimes figure out that analyzing other subproblems is going to be a waste of your time, so you don't have to enumerate all the possibilities. This is because solving the linear relaxation gives us a bound on the optimal objective of the integer subproblem. For instance, if you have a minimum-cost problem, the linear relaxation will always have lower cost than (or equal to) the original integer problem, because it has fewer constraints - we have removed the constraints that the variables had to be integer - so we have more latitude to do whatever we want to decrease cost. So if we happen to have a feasible integer solution that gives us a cost of 100 and the linear relaxation of another subproblem gives us 103.4, we know that adding constraints to that subproblem to make the decision variables integer will only increase cost, so we'll never get lower than 103.4 and don't need to study that subproblem in more detail.
The better the bound, the more information we get and the faster we are going to eliminate subproblems and find the optimal solution. The issue with traditional modeling for logistics problems is that the linear relaxations typically give poor bounds. So we can't remove subproblems easily, and the computer takes forever to solve the original problem because we have to go through all the subproblems to find the optimal solution. This would be annoying for small-size instances, but it becomes a major issue in logistics, where some formulations have hundreds of thousands of variables.
Composite variables. The idea behind composite variables is to reformulate the problem in an equivalent way that gives better bounds at the linear relaxation level. What composite variables do is to capture several decisions simultaneously. For instance (the following example is taken from these slides, p.15), if you have to deliver 6,000 packages to the sorting center and you can either use an aircraft with capacity 3,000 or an aircraft with capacity 8,000, you know right away you will either use two small planes or a big one to match the demand. So instead of writing 3000*x1+8000*x2>=6000, you can write 6000*x3+8000*x2>=6000. Initially, x1 (number of small planes) could have been 0, 1, 2 and x2 (number of big planes) 0, 1, but now both x3 (whether we use small planes) and x2 (whether we use big planes) are 0 or 1. In other words, composite variables here capture the choice between pre-defined strategies, rather than letting the computer struggle by itself with defining the details of each strategy.
It gets better! (Really, it does.) Once you have written all that, you can replace the 6000*x3+8000*x2>=6000 constraint by the connectivity constraint x3+x2>=1, that says the sorting center must be supplied somehow (either using the big-aircraft strategy or the small-aircraft strategy). Before, the connectivity constraint x1+x2>=1 would not have guaranteed that the center had all the packages, since it allowed for only one small aircraft to supply the sorting center (x1=1, x2=0), bringing only 3000 out of 6000.
The fact that, with this new formulation, we can drop the demand-capacity constraint 6000*x3+8000*x2>=6000 is good news because (1) we like having fewer constraints, (2) connectivity constraints like x3+x2>=1 go through integer points (x3+x2=1 is the line going through x3=1, x2=0 and x3=0 and x2=1). In contrast, 6000*x3+8000*x2>=6000 goes through (x3=1,x2=0), which we like because it's integer, but also through (x3=0,x2=0.75), which we don't like, because if the linear relaxation ends up picking that point as the optimal solution, we won't have integrality of the variables. That would mean more work ahead of us to get to an integer solution. Obviously, there are other constraints in the problem that mean that, even after all this, we won't necessarily have an integer solution right away, but at least we'll be a lot closer.
This method resulted (according to p.20 of this presentation) in 5.3% savings in annual operating costs and a 7.69% reduction in number of planes used, which also meant the leases on these planes could be terminated, leading to additional savings. Logistics modeling isn't for the faint of heart (after all, that's why operations research PhDs are in high demand in industry), but all that hard work was definitely worth it for the company involved.