How We Built SQL Rollups on Streaming Data
Rollups in Rockset pre-aggregate streaming data at ingest time. Learn how we built SQL rollups to support exactly-once write semantics and process out-of-order arrivals correctly.
More Details
Rollups in Rockset pre-aggregate streaming data at ingest time. Learn how we built SQL rollups to support exactly-once write semantics and process out-of-order arrivals correctly.
Rockset is a real-time analytics database for serving fast search and analytics at scale. We built SQL rollups in Rockset that can pre-aggregate data from streaming sources, like Apache Kafka and Amazon Kinesis. Using rollups can improve storage efficiency and query performance.
Over the course of this project, we encountered multiple challenges in building SQL rollups on streaming data, including:
- supporting exactly-once write semantics
- executing SQL on streaming data at ingest time
- processing out-of-order arrivals correctly
In this talk, we discuss how we overcame these technical challenges to implement rollups.
About the Speakers
Tudor Bosman leads architecture for Rockset's search and analytics engine. Prior to Rockset, Tudor was an engineer at Facebook, where he spearheaded Unicorn, Facebook's search engine, and built infrastructure for the Facebook AI Research Lab and Facebook's applied machine learning initiative. Prior to Facebook, Tudor worked at Google on Gmail's storage and indexing backend, and at Oracle on database server internals. Tudor holds an MS in Computer Science from Stanford and a BS in Computer Science from Caltech.
Karen Li is a software engineer at Rockset on the Systems team, which is responsible for Rockset's distributed SQL query engine. She joined Rockset after graduating from UCLA in 2019 with a bachelor's in computer science. Some highlights of her time at Rockset include optimizing distributed aggregations, debugging gnarly production bugs, and helping implement SQL-based rollups.