Hasso-Plattner-Institut
Prof. Dr. Tobias Friedrich
 

Introduction

Imagine \(m\) machines (e.g., a compute cluster or robots in a production line) on which you want to process \(n\) jobs (think datasets, workpieces, etc.). Which job should go to which machine in which order to minimize makespan? How would you maximize throughput? Can you ensure that you meet all the deadlines for the individual jobs?

The underlying problem is that of scheduling jobs on machines. With each job come processing time requirements on different machines and possibly release times, deadlines, precedences, or dependencies. The machines, in turn, might be all the same or of varying power or purpose, and there might be requirements to the order in which jobs go through the machines. Finally, one might want to optimize for one of many possible objective functions such as makespan, maximum lateness of any job, or the number of late jobs. Computing an (approximately) optimal schedule that assigns jobs to machines for specific time intervals is a core problem of algorithm engineering. The "scheduling zoo" [2] is a collection of many results on scheduling problems.

We analyze scheduling problems with an additional twist: Instead of running an algorithm to compute a complete schedule and only then using it, we expect a solution algorithm to enumerate schedule entries \((j, i, s, f)\) with the meaning: "Process job \(J_j\) on machine \(M_i\) from start time \(s\) to finish time \(f\)." We now aim for an algorithm with little preprocessing time and small delay between the enumerated solution parts. Then, we can execute the schedule as soon as the first entries are fixed, which might be way before the algorithm finalizes the whole solution.

Examples

We make use of the standard 3-field notation introduced by Graham et al. [4] as presented by Brucker [1].

\(1 | | C_{max}\)

With only one machine it is trivial to minimize the latest finish time: All schedules without idle time are optimal and we can enumerate such a schedule without preprocessing and with constant delay.

\(P | | C_{max}\)

Computing a schedule for two or more identical parallel machines that minimizes the latest finish time is NP-complete, as such a schedule solves the partition problem. Hence, assuming \(P \neq NP\), we do not hope for an efficient enumeration algorithm either.

However, scheduling the jobs according to the LPT-rule (Largest Processing Time first) gives a \(\left(\tfrac{4}{3} - \tfrac{1}{3m}\right)\)-approximation as shown by Graham [3]. We can enumerate such a schedule for \(n\) jobs with \(O(n)\) preprocessing and \(O(\log(n))\) delay with HeapSort.

\(1 | r_j | C_{max}\), \(1 | d_j | L_{max}\)

The same HeapSort approach yields optimal schedules for one machine, release times and minimal latest finish time by sorting according to non-decreasing release time. Similarly, by sorting according to the EDD-rule (Earliest Due Date first), we get schedules for one machine and jobs with deadlines that minimize maximum lateness.

\(1 | prec | C_{max}\)

If we introduce a precedence relation in the form of a directed acyclic graph (DAG), the jobs have to be processed according to a topological ordering. Enumerating a topological ordering is possible without preprocessing and with delay in the order of the maximum out-degree in the precedence graph. Additional preprocessing is necessary if we want the schedule to be enumerated beginning with the first entry.

Open Problems

We can transform many classical scheduling algorithms into enumeration algorithms - notably the ones making use of a simple sorting rule. However, there are also variants and algorithms that do not yield an obvious enumerative solution. Take, for example, \(1 | d_j | \Sigma U_j\) - the problem of scheduling jobs with deadlines on a single machine and minimizing the number of late jobs. The Moore-Hodgson Algorithm [5] solves this problem by repeatedly rejecting "bad" jobs. Virtually any job could be rejected in the very last step, making it hard to enumerate solution parts. Thus, an enumeration algorithm would probably have to follow a different approach. Finding such new enumeration approaches for typical scheduling variants remains an open problem.

References

  1. P. Brucker, Scheduling algorithms, 5th ed. Berlin ; New York: Springer, 2007.
  2. P. Brucker et al., “The scheduling zoo.” http://schedulingzoo.lip6.fr/.
  3. R. L. Graham, “Bounds on Multiprocessing Timing Anomalies,” SIAM J. Appl. Math., vol. 17, no. 2, pp. 416–429, Mar. 1969, doi: 10.1137/0117039.
  4. R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. R. Kan, “Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey,” in Annals of Discrete Mathematics, vol. 5, P. L. Hammer, E. L. Johnson, and B. H. Korte, Eds. Elsevier, 1979, pp. 287–326. doi: 10.1016/S0167-5060(08)70356-X.
  5. J. M. Moore, “An n Job, One Machine Sequencing Algorithm for Minimizing the Number of Late Jobs,” Management Science, vol. 15, no. 1, pp. 102–109, 1968.