Template Vector Library
The first step towards the decoupling is to abstract the details of the underlying hardware. This can be seen as the syntactical requisite. The Template Vector Library (TVL) provides this kind of abstraction for vector-based programming. That way, TVL allows a single code base for database operators. Within the codebase, only one line, containing the processing style, has to be adapted according to the underlying hardware.
TVL works as follows: A set of primitives is mapped to different backends supporting different hardware platforms. Thus, the program is implemented against the primitives and the mapping takes place during compilation. This mapping during compilation for example enables generating and debugging a scalar version and then executing the vectorized version. Especially for debugging this can be useful. The primitives are specified by the processing style. The processing style includes the hardware dialect (e.g. scalar) and the vector size. As mentioned above, this specification is the only part of the codebase that has to be changed. TVL is open source and provided on GitHub.
Experiments show, that the overhead of the mapping during compilation can be neglected.
Virtual Vector Library
Subsequent to the abstraction of the syntactic specifics of the underlying hardware, the next step is the decoupling of the application vector size and the hardware vector size. The Virtual Vector Library (VVL) is built on top of TVL and enables mapping virtual vectors to hardware implementations. The same primitives as in TVL are used. This enables different possibilities of parallel, sequential or mixed executions, depending on what is beneficial for the application at hand. The goal is to allow for runtime reconfiguration. To enable the mapping of virtual vectors, the processing style is extended by the virtual vector style. The virtual vector includes information about the vector length, vector extension, and thread count. That way, the system can resolve larger virtual vectors into smaller ones on the hardware side.
Again, as only the processing style has to be adapted, there is no need for a change in the actual programming logic. At the same time, only changing the processing style leads to a high degree of freedom: arbitrary combinations of the three dimensions number of elements per hardware vector, vectors per core, and the number of threads/cores running in parallel are possible. Thus, the mapping of virtual vectors becomes an optimization problem.
Vector Sharing Library
As experiments show that bigger vectors are not always better but instead smaller vectors can also be beneficial, depending on the application, there should also be a possibility to proceed the mapping the other way around. Thus, Vector Sharing Library enables the mapping of small vectors on the application layer to be merged and executed jointly on the hardware layer. This is done by using vector registers as hardware resources for sharing data. VSL is - similar to VVL - also build on top of TVL (as shown in fig 3).
This approach especially applies to highly selective queries, that start with a lot of data, but end with only a few rows. These large vectors are beneficial at the beginning, but at the end, big parts of the vectors are not used anymore. The idea is to optimize the workload by multiple query execution, which means using multiple predicates and comparing them to one data object. So actually sharing the query instead of the data. Accordingly, this is referred to as SIMQ - Same Instruction Multiple Query. An obvious shortcoming of this approach is that there's no more data parallelism. To address this, a combination of both approaches is proposed. The design space is then defined by the degree in which data respectively query sharing should be used. By moving within this design space, the best degree of fine-grained parallelism can be found.
Coming back to the overarching goal, it can be said, that vectors are used as a technical vehicle for optimization. Through VSL not only data-parallelism is applicable, but also the sharing of queries.
Summary
With the challenges that applications and infrastructures have to be provoked and scaled faster, composable systems in combination with vectorization are a promising solution approach. Providing cache-coherent memory across multiple hardware components opens up new possibilities at the software level of processing data. The idea of vectorization that large amounts of data can be processed in parallel from a small set of instructions is a promising programming paradigm. But different hardware also brings different problems that need to be solved.
The provided vector libraries aim at a decoupling of what is provided by the underlying hardware and what is required from the software side. While the Template Vector Libraries provides the syntactical prerequisites by abstracting the details of the underlying hardware. The Virtual Vector Library and Sharing Vector Libraries are concerned with decoupling the vector size that is used by the application from the hardware vector size. VVL does that by mapping virtual vectors to hardware vectors and thus enables using larger vectors than provided by the hardware. VSL enables merging and jointly executing smaller vectors. In general, the vector libraries allow leveraging vectors as a technical vehicle for optimisation.
The approach of the research group shows great potential and leaves room for many new ideas. It remains to be seen what ideas the research team will implement in the future and what results they will present. If interested, the current status of the research can be followed here.
References
[1] Wolfgang Lehner. (2021). Flexible Vector Processing for Database Engines - Presentation.
[2] Li, Y., & Patel, J. (2013). Bitweaving: Fast scans for main memory data processing. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (pp. 289–300).
[3] (2021). TVL GitHub Repo. github.com/MorphStore/TVLLib.git.