Big things have been happening behind the scenes for the premier Monster SCALE Summit. Ever since we introduced it at P99 CONF, the community response has been overwhelming. We’re now faced with the “good” problem of determining how to fit all the selected speakers into the two half-days we set aside for the event. 😅
If you missed the intro last year, Monster Scale Summit is a highly technical conference that connects the community of professionals designing, implementing, and optimizing performance-sensitive data-intensive applications. It focuses on exploring “monster scale” engineering challenges with respect to extreme levels of throughput, data, and global distribution. The two-day event is free, intentionally virtual, and highly interactive.
Register – it’s free and virtual
We’ll be announcing the agenda next month. But we’re so excited about the speaker lineup that we can’t wait to share a taste of what you can expect. Here’s a preview of 12 of the 60+ sessions that you can join on March 11 and 12…
Designing Data-Intensive Applications in 2025
Martin Kleppmann and Chris Riccomini (Designing Data-Intensive Applications book)
Join us for an informal chat with Martin Kleppmann and Chris Riccomini, who are currently revising the famous book Designing Data-Intensive Applications. We’ll cover how Data-Intensive Applicationshave evolved since the book was first published, the top tradeoffs people are negotiating today, and what they believe is next for Data-Intensive Applications. Martin and Chris will also provide an inside look at the book writing and revision process.
The Nile Approach: Re-engineering Postgres for Millions of Tenants
Gwen Shapira (Nile)
Scaling relational databases is a notoriously challenging problem. Doing so while maintaining consistent low latency, efficient use of resources and compatibility with Postgres may seem impossible. At Nile, we decided to tackle the scaling challenge by focusing on multi-tenant applications. These applications require not only scalability, but also a way to isolate tenants and avoid the noisy neighbor problem. By tackling both challenges, we developed an approach, which we call “virtual tenant databases”, which gives us an efficient way to scale Postgres to millions of tenants while still maintaining consistent performance.
In this talk, I’ll explore the limitations of traditional scaling for multi-tenant applications and share how Nile’s virtual tenant databases address these challenges. By combining the best of Postgres existing capabilities, distributed algorithms and a new storage layer, Nile re-engineered Postgres for multi-tenant applications at scale.
The Mechanics of Scale
Dominik Tornow (Resonate HQ)
As distributed systems scale, the complexity of their development and operation skyrockets. A dependable understanding of the mechanics of distributed systems is our most reliable parachute.
In this talk, we’ll use systems thinking to develop an accurate and concise mental model of concurrent, distributed systems, their core challenges, and the key principles to address these challenges. We’ll explore foundational problems such as the tension between consistency and availability, and essential techniques like partitioning and replication.
Whether you are building a new system from scratch or scaling an existing system to new heights, this talk will provide the understanding to confidently navigate the intricacies of modern, large-scale distributed systems.
Feature Store Evolution Under Cost Constraints: When Cost is Part of the Architecture
Ivan Burmistrov and David Malinge (ShareChat)
At P99 CONF 23, the ShareChat team presented the scaling challenges for the ML Feature Store so it could handle 1 billion features per second. Once the system was scaled to handle the load, the next challenge the team faced was extreme cost constraints: it was required to make the same quality system much cheaper to run.
Ivan and David will talk about approaches the team implemented in order to optimize for cost in the Cloud environment while maintaining the same SLA for the service. The talk will touch on such topics as advanced optimizations on various levels to bring down the compute, minimizing the waste when running on Kubernetes, autoscaling challenges for stateful Apache Flink jobs, and others. The talk should be useful for those who are either interested in building or optimizing an ML Feature Store or in general looking into cost optimizations in the cloud environment.
Time Travelling at Scale
Richard Hart (Antithesis)
Antithesis is a continuous reliability platform that autonomously searches for problems in your software within a simulated environment. Every problem we find can be perfectly reproduced, allowing for efficient debugging of even the most complex problems. But storing and querying histories of program execution at scale creates monster large cardinalities. Over a ~10 hour test run, we generate ~1bn rows. The solution: our own tree-database.
30B Images and Counting: Scaling Canva’s Content-Understanding Pipelines
Dr. Kerry Halupka (Canva)
As the demand for high-quality, labeled image data grows, building systems that can scale content understanding while delivering real-time performance is a formidable challenge. In this talk, I’ll share how we tackled the complexities of scaling content understanding pipelines to support monstrous volumes of data, including backfilling labels for over 30 billion images.
At the heart of our system is an extreme label classification model capable of handling thousands of labels and scaling seamlessly to thousands more. I’ll dive into the core components: candidate image search, zero-shot labelling using highly trained teacher models, and iterative refinement with visual critic models. You’ll learn how we balanced latency, throughput, and accuracy while managing evolving datasets and continuously expanding label sets. I’ll also discuss the tradeoffs we faced—such as ensuring precision in labelling without compromising speed—and the techniques we employed to optimise for scale, including strategies to address data sparsity and performance bottlenecks.
By the end of the session, you’ll gain insights into designing, implementing, and scaling content understanding systems that meet extreme demands. Whether you’re working with real-time systems, distributed architectures, or ML pipelines, this talk will provide actionable takeaways for pushing large-scale labelling pipelines to their limits and beyond.
How Agoda Scaled 50x Throughput with ScyllaDB
Worakarn Isaratham (Agoda)
In this talk, we will explore the performance tuning strategies implemented at Agoda to optimize ScyllaDB. Key topics include enhancing disk performance, selecting the appropriate compaction strategy, and adjusting SSTable settings to match our usage profile.
Who Needs One Database Anyway?
Glauber Costa (Turso)
Developers need databases. That’s how you store your data. And that’s usually how it goes: you have your large fleet of services, and they connect to one database. But what if it wasn’t like that? What if instead of one database, one application would create one million databases, or even more? In this talk, we’ll explore the market trends that give rise to use cases where this pattern is beneficial, and the infrastructure changes needed to support it.
How We Boosted ScyllaDB’s Data Streaming by 30x
Asias He (ScyllaDB)
Streaming, the process of scaling out of/into other nodes, used to analyze every partition one-by-one. It was too slow and depended on the schema. File-based stream is a new feature that significantly optimizes tablet movement. It streams the entire SSTable files without deserializing SSTable files into mutation fragments and re-serializing them back into SSTables on receiving nodes. As a result, less data is streamed over the network, and less CPU is consumed, especially for data models that contain small cells.
Evolving Atlassian Confluence Cloud for Scale, Reliability, and Performance
Bhakti Mehta (Atlassian)
This session covers the journey of Confluence Cloud – the team workspace for collaboration and knowledge sharing used by thousands of companies – and how we aim to take it to the next level, with scale, performance, and reliability as the key motivators.
This session presents a deep dive to provide insights into how the Confluence architecture has evolved into its current form. It discusses how Atlassian deploys, runs, and operates at scale and all challenges encountered along the way.
I will cover performance and reliability at scale starting with the fundamentals of measuring everything, re-defining metrics to be insightful of actual customer pain, auditing end-to-end experiences. Beyond just dev-ops and best practices, this means empowering teams to own product stability through practices and tools.
Two Leading Approaches to Data Virtualization: Which Scales Better?
Dr. Daniel Abadi (University of Maryland)
You have a large dataset stored in location X, and some code to process or analyze it in location Y. What is better: move the code to the data, or move the data to the code? For decades, it has always been assumed that the former approach is more scalable. Recently, with the rise of cloud computing, and the push to separate resources for storage and compute, we have seen data increasingly being pushed to code, flying in face of conventional wisdom. What is behind this trend, and is it a dangerous idea? This session will look at this question from academic and practical perspectives, with a particular focus on data virtualization, where there exists an ongoing debate on the merits of push-based vs. pull-based data processing.
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System
Dmytro Hnatiuk (Wise)
Scaling a system from 66 million to over 25 billion records is no easy feat—especially when it’s a core financial system where every number has to be right, and data needs to be fresh right now. In this session, I’ll share the ups and downs of managing this kind of growth without losing my sanity. You’ll learn how to balance high data accuracy with real-time performance, optimize your app logic, and avoid the usual traps of database scaling. This isn’t about turning you into a database expert—it’s about giving you the practical, no-BS strategies you need to scale your systems without getting overwhelmed by technical headaches. Perfect for engineers and architects who want to tackle big challenges and come out on top.