Stream processing is an essential part of modern data infrastructure, but building an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has become an effective way to address these challenges.
In this talk, we discuss the benefits and limitations of the decoupled compute and storage architecture in stream processing systems. We find that, while decoupling compute and storage can help achieve infinite scalability, this approach can lead to data consistency and high latency issues, especially when processing complex continuous queries that require managing extra-large internal states. We then present our solution to address the challenges by implementing a tiered storage mechanism. The tiered storage approach utilizes a combination of high-performance and low-cost storage tiers to minimize data movement between the compute and storage layers while maintaining efficient processing. By the the end of the talk, we will present experimental results that demonstrate the balance between performance and cost-efficiency achieved by our proposed approach.