Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

A Programming Language and Compiler View on Data Management and Machine Learning Systems

Tiark Rompf, Purdue University, West Lafayette, USA

Abstract

In this talk, we will take a look at data management and machine learning systems from a programming languages perspective, and demonstrate how principled PL approaches can simplify the design of such systems by an order of magnitude while achieving best-of-breed performance. An old but underappreciated idea known as Futamura projections fundamentally links interpreters and compilers, and leads to efficient query compilation using techniques that are no more difficult than writing a query interpreter in a high-level language. We present Flare, a new back-end for Spark SQL that yields significant speedups by compiling query plans to native code, competitive with the best compiled query engines from the database community. In the deep learning space, the current generation of systems tends to limit either expressiveness and ease of use for increased performance (e.g., TensorFlow) or vice versa (e.g., PyTorch). We demonstrate that a “best of both worlds” approach is possible. We present Lantern, a system that takes a fresh look at backpropagation, and provides a generic and performant differentiable programming framework based on multi-stage programming and delimited continuations, two orthogonal ideas firmly rooted in programming languages research.

Biography

Tiark Rompf joined Purdue University in 2014 and he is currently co-directing the Purdue Center for Programming Principles and Software Systems (PurPL). His scientific home is in programming languages and compilers, but his contributions span areas such as architecture, databases, and machine learning. From 2008 to 2014 he was a member of the team at EPFL that designed the Scala programming language and he made various contributions to the Scala language and toolchain (delimited continuations, efficient immutable data structures, compiler speedups, formalization of the type system). He received his PhD from EPFL in 2012, and was a researcher at Oracle Labs from 2012 to 2014. He has numerous collaborations with industry, and serves as scientific advisor for AI chip startup SambaNova Systems. His work received awards including a CACM Research Highlight 2012, VLDB Best Paper 2014, NSF CAREER 2016, DOE Early Career Award 2017, and the VMware Systems Research Award 2018.

A recording of the presentation is available on Tele-Task.

Summary

written by Emanuel Metzenthin and Stefan Reschke

Rompf emphasizes that in order to develop efficient Data Engineering Systems many different branches in IT research need to be combined. In his talk in the Lecture Series on Practical Data Engineering at HPI, he focuses on the role of programming languages and compilers in the development of these systems.

He begins by introducing research results of his early research career. The first presented result [1] is a technique to achieve asynchronous program behaviour while maintaining an easy to follow synchronous code syntax. It utilizes the type system of programming languages. An implemented ‘sleep’ function for example can be tracked in the type system. Using an effect annotation, the ‘sleep’ primitive can be used to work in an asynchronous way. This technique, called delimited control operators, can be seen as a way to mitigate the dreaded ‘callback-hell’ often seen in complex asynchronous code.

The second result [2] again uses the type system. This time the goal is to have high-level code be executed as efficiently as optimized low-level code. By overloading operations using a marker type constructor in the type system, efficient execution could be enabled while maintaining a higher level of abstraction. E.g. operations on a floating point variable could be implemented such that they construct a computational graph that enables fast execution.

Multi-Stage Programming for straight-forward development of Query Engines

Following the introduction of these research results, Rompf focuses on the concept of Query Engines. He initially addresses the usual procedure of transforming a Query Language input into a Logical and then a Physical Execution Plan. Rompf states that these steps are similar to what a standard compiler does when compiling source code. The only difference is that Query Engines do not generate native code in the end. Disk access bottlenecks in earlier stages of database research made this optimization unnecessary. With the abundance of main memory nowadays however, the translation into native code becomes more relevant.

As a milestone, Rompf mentions a paper that proposes using LLVM, a modular compiler framework, to compile queries in order to enhance performance [3]. This however is still a low-level approach. The second milestone was a paper published in 2016 in which various intermediate languages were used to implement the generation of native code [4]. Given a translation between these languages this may result in efficient low-level code. The problem here is that those languages and translations need to be defined and their correctness ensured. The paper concludes that ‘creating a query compiler that produces highly optimized code is a formidable challenge.’

In order to simplify this, Rompf introduces the idea of Futamura-projections that were introduced by Yoshihiko Futamura in 1971 [5]. The key insights of the first Futamura Projection is that compilers and interpreters are not fundamentally different. In fact, an interpreter can be transformed into a compiler through specialization. However the component to automatically generate specialized interpreters (historically called “mix”) is difficult to implement. This problem can be met using an additional level of indirection. The idea of multi-stage-programming (second research result as above), which makes use of the type system, can be a solution to the problem. If custom types are implemented such that they generate specialized code for overloaded operations, this specialized code can be executed without changing the original code. If an interpreter is implemented using these custom types, the resulting ‘staged-interpreter’ can be used to compile code.

    Introducing Flare as an accelerator for Apache Spark

    This result can be applied to Query Engines since they are themselves interpreters. Using multi-stage programming on the implementation of query operations (e.g. different kinds of joins), a simple obvious implementation can result in high-performant code execution. For that only the types of a data record need to be changed to use the code generating custom types. Combining this insight with the benefits of object oriented design, Rompf referencing [6] concludes that writing an efficient query compiler is easy.

    Applying these findings, Rompf presents Flare. Flare can be described as a drop-in accelerator for Apache Spark [Link 2]. It uses the same high-level components and interfaces as Spark, while leaving out the Spark’s low-level logic. Instead it uses a query compiler based on the presented principle. Rompf discusses why an acceleration of Spark is necessary. If Spark und C code are compared using TCP-H Query 6 as a benchmark, Spark is 20 times slower than hand written C code. This is due to large overhead in executing the query. Using the Flare query compiler, a much better performance can be achieved while still using the high-level interface of Spark.

    When using more than one core, Flare continues to outperform Spark in all regards.The limits of computational speed-up by increasing the number of cores in the distributed systems here prove to be the high frequency of bus communication and communication in the distributed system. Flare also proves to be much faster than Apache Flink, which is in most queries faster than Spark but also again significantly slower than Flare. Thus Flare proves to provide superior performance compared to Apache Spark and Flare.

    Differential Programming using good software design

    In the end Rompf brings together his prior learnings to form a new concept for a very modern and highly demanded field: Deep Learning. Deep Neural Networks are not much more than a set of weighted functions optimized using the gradient descent algorithm. For basic mathematical functions we can use highschool algebra to solve for their derivatives. Recently the research field called ‘Differential Programming’ has emerged. In this new field language experts try to derive more complex language constructs like loops and branches in order to tune more sophisticated programs to training data [Link 3].

    Rompf presents a technique called ‘forward-mode automatic differentiation’. Using type overloading you declare a custom number object which additionally saves its derivative. Each mathematical operation gets overwritten to adjust the derivative accordingly. For example adding a Number B to Number A returns a Number containing their added values as well as their added derivatives.

    To calculate a gradient we can use a higher-order function taking the original function and the parameters of that function as arguments. Since the custom Number objects are used, the function is automatically derived when called.

      This presents an easy-to-follow way of computing forward-propagation in a neural network. Unfortunately this approach does not scale to the large weight vectors usually seen in deep learning. To remediate this issue backpropagation (or ‘reverse-mode automatic differentiation’) was developed. The backpropagation algorithm first computes the results of the derivatives in a forward-pass and then updates the weights of the neural network in a backward-pass caching all intermediate results.

      In a simplified view backpropagation follows a ‘continuation’ pattern: It can be computed by solving the current step in the forward-pass, triggering the computation for all following steps (i.e. the ‘continuation’) and finally updating the current gradient values once the result of the continuation arrives. This way the code above can take a callback for the continuation of the forward-pass as follows.

        Making use of his first research result from above [1] Tiark Rompf shows how these callbacks can be hidden from the programmer using shift/reset operators resulting in very clean code.

        His group at the Purdue University comprised these concepts in a machine learning framework prototype (Lantern - [Link 4]) which shows promising results in benchmarks.

        Conclusion

        Tiark Rompf combined his track record of different new program design techniques to build a highly efficient big data system (Flare) as well as introduce a simple and effective implementation approach to differential programs. Especially this new field has a lot of potential for even more complex machine learning models and dynamic software. It will be interesting to see what the future holds in this area.

        Links

        1. https://tiarkrompf.github.io/
        2. https://flaredata.github.io/
        3. https://towardsdatascience.com/deep-learning-from-a-programmers-perspective-aka-differentiable-programming-ec6e8d1b7c60
        4. https://github.com/feiwang3311/Lantern

        References

        1. Rompf, Tiark / Maier, Ingo / Odersky, Martin (2009): Implementing First-Class Polymorphic Delimited Continuations by a Type-Directed Selective CPS-Transform. Proceedings of the ACM SIGPLAN International Conference on Functional Programming, ICFP. 44.
        2. Rompf, Tiark / Odersky, Martin (2010): Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. ACM SIGPLAN Notices.
        3. Neumann, Thomas (2011): Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB. 4. 539-550.
        4. Shaikhha, Amir et al. (2016): How to Architect a Query Compiler. ACM SIGMOD Conference 2016, 1907-1922
        5. Futamura, Yoshihiko (1971): Partial Evaluation of Computation Process - An Approach to a Compiler-Compiler, In ‘Systems.Computers.Controls’, Volume 2, Number 5, pages 45–50.
        6. Rompf, Tiark / Amin, Nada (2015): Functional Pearl: A SQL to C Compiler in 500 Lines of Code. ACM SIGPLAN Notices. 50. 2-9.