Deploying trained machine learning pipelines inside database systems allows users to integrate predictions into downstream queries, leveraging the server’s hardware and avoiding data transfer. For the users of a deployed pipeline in a database system, the consistent performance of inference is often more important than the performance of training.
Typically, inference consists of a preprocessing part that transforms a data point using different trained preprocessing operators and a scoring part that takes the preprocessed data point and uses the learned parameters of a machine learning algorithm to produce predictions. Depending on the complexity and number of operators, the preprocessing can dominate the inference latency. On the other hand, if the machine learning algorithm is too complex, e.g., a deep neural network, the scoring will dominate the inference latency.
Current approaches to performing in-database inference implement costly preprocessing operators and machine learning algorithms in the database either natively, transpiling code to generate SQL representations of the operators, or optimizing the execution of user-defined functions in guest languages such as Python. Instead, we propose a novel framework that approximates end-to-end any trained machine learning pipeline with a trie-based data structure, avoiding costly preprocessing and scoring. We leverage the ability of well-studied trie-based structures in database systems to organize data embeddings and supervised feature discretization techniques to create an index for the outcomes of the trained pipeline. On several classification and regression tasks, our framework achieves a significant decrease in inference latency while maintaining similar effectiveness compared to the pipeline it approximates.