ScyllaDB Cloud vs DynamoDB

Performance Report: ScyllaDB Cloud shows better performance over Amazon's DynamoDB and is significantly less expensive for similar workloads.

ScyllaDB vs. DynamoDB Benchmark

ScyllaDB Cloud vs DynamoDB Benchmark Overview

ScyllaDB has recently experienced a significant increase in the number of DynamoDB users moving to ScyllaDB Cloud, our database-as-a-service offering. Price performance is cited as a primary driver in virtually all of these interactions.
To help teams better assess whether a move makes sense, we decided to perform a detailed DynamoDB benchmark with a price-performance comparison analyzing:

  • How cost compares across both DynamoDB pricing models under various workload conditions, distributions, and read:write ratios
  • How latency compares across a variety of workload conditions

Summary of ScyllaDB Cloud vs DynamoDB Benchmark Results

This benchmark report outlines the detailed findings, but here’s the bottom line: ScyllaDB costs are significantly lower than DynamoDB costs in all but one scenario. In realistic workloads, costs would be 5X to 40X lower — with up to 4X better P99 latency.

Here is a consolidated look at how DynamoDB and ScyllaDB compare on cost and performance for just one of the many workloads we tested. DynamoDB shines with a Uniform distribution and struggles with the others; we chose to highlight a case where it shines.

scylladb_vs_dynamodb_benchmark1

Cost Results

For our cost comparisons, we started with a range of throughput + workload scenarios and calculated their cost on ScyllaDB Cloud with the AWS instances specified in the extended report. Next, we calculated the cost of running the same throughput + workload scenarios on DynamoDB. We needed to know:

  • The item size (to compute the WCUs and RCUs used per operation)
  • The monthly price for 1 TB of storage
  • The cost per unit

We used an item size of 1081 bytes (calculated using the DynamoDB documentation), which translates to 2 WCUs per write operation and 1 RCU per read operation on DynamoDB. 1 TB costs ~$250/month. The cost per unit varies according to the pricing model: provisioned or on-demand.

For ScyllaDB we used a 3-node cluster for all modes. Users can scale out/in the cluster at any point. Hourly rates (on-demand) were used for ScyllaDB. As the platform scales with the amount of resources, you can linearly change the price for the performance level you require. Annual pricing provides significant cost reduction but is out of the scope of this benchmark.

DynamoDB has two modes for non-annual pricing: provisioned and on-demand pricing. Provisioned mode is recommended if your workloads are reasonably predictable. On-demand pricing is significantly more expensive and is a fit for unpredictable workloads. It is possible to combine modes, add auto-scaling, and so forth. DynamoDB provides considerable flexibility around managing the cost and scale of the aforementioned options, but this also results in considerable complexity.

We measured the ScyllaDB performance in all different workload mixes and distributions, then compared it to the cost of DynamoDB with allocations to the required read/write units. Note that beyond ScyllaDB’s on-demand pricing (which can be estimated via our pricing calculator), discounts are provided for annual commitments. To see how we calculated costs, refer to the Cost Calculations section of the full report’s Appendix.

Provisioned Mode Cost Comparisons

Provisioned mode is recommended if your workloads are reasonably predictable. With DynamoDB, you need to be able to predict per-table capacity regarding read and write capacity units.

dynamodb_cost_1 graph
dynamodb_cost_2 graph
dynamodb_cost_3 graph

With just one exception, DynamoDB’s cost estimates were consistently higher than ScyllaDB’s – and much more so for the most write-heavy workloads.* This is not surprising, given that DynamoDB charges 5X more for writes than for reads, while ScyllaDB does not differentiate between operations, and its pricing is based on the actual cluster size.

We also compared pricing using DynamoDB’s on-demand pricing model. For details, see the complete benchmark report.

Performance Results

Next, let’s shift the focus to performance. For use cases that require near real-time responses (e.g., AdTech, messaging, gaming, IoT), P99 latency is critical for meeting SLAs and delivering an engaging user experience.

Uniform Distribution

Here are the performance results for the Uniform distribution.

dynamodb_latency1 graph

ScyllaDB’s latency was significantly lower than that of DynamoDB – at a fraction of the cost. An even larger reduction in P99 latency and mean latency (under 1 ms) could be achieved by using a larger ScyllaDB cluster and reducing the utilization from 75% to the 30%-50% range. Additionally, ScyllaDB comes with several options which might further improve latencies, such as the BYPASS CACHE extension, designed for workloads that don’t make effective use of caching (e.g., a workload that is much larger than the RAM with Uniform distribution). Such options have not been applied for the purpose of this performance testing.

Hotspot Distribution

Now, let’s shift to a more realistic distribution: Hotspot. As explained in the full report’s Appendix, the Hotspot distribution has some sets of values accessed more frequently than others.

dynamodb_latency2 graph

As you can see, the discrepancy between DynamoDB and ScyllaDB latencies widened for these more realistic workloads. Since a portion of the requests had a higher chance of hitting an object from the hot set, the probability of ScyllaDB’s cache being touched during reads increased significantly. The results show that ScyllaDB P99 latencies decreased compared to the previous Uniform results as the read ratio increased.

The next graph represents the P99 latency as a function of ScyllaDB’s utilization. Note how the P99 decreases along with a lower cluster load:

dynamodb_latency3 graph

Zipfian Distribution

Finally, let’s look at the interesting case of Zipfian distribution. Zipfian distribution takes the hotspot to new levels by simulating access patterns with an exponential relation between items (a.k.a. “hotness”). As Alex DeBrie notes, Zipfian is generally problematic for DynamoDB: “The most popular items are accessed orders of magnitude more than the average item. Because DynamoDB wants a more even distribution of your data, your application may get throttled as it tries to access popular items.”

As expected, DynamoDB performed poorly on these workloads.

dynamodb_latency4 graph

As noted by DeBrie, the Zipfian distribution is likely to create hot partitions, an imbalance introduced by uneven item access from within the database. As hot partitions are a known antipattern, DynamoDB restricts the number of hits on the same partition (documented to be 3,000 RCUs and 1,000 WCUs per partition at the time of writing). Aware of this limitation and of Zipfian’s imbalances, we compared both ScyllaDB and DynamoDB at scale.

Once DynamoDB limits were reached, requests started getting throttled, requiring the application to retry. As a result of the throttling and retries, the YCSB latency became severely impacted. Under some circumstances, we experienced DynamoDB throttling at up to ~2.5 seconds per request – thus preventing the application from accessing the item during that time.

At best, DynamoDB delivered 39.72% of the expected throughput (88.19 good kops/s vs. the target 222 kops/s). At worst, it delivered only 16.22% (20.28 kops/s) of what was purchased (125 kops/s) and supposedly provisioned. Given the documented DynamoDB limitations, some discrepancy is to be expected. However, the magnitude of this unfulfilled throughput was surprising.

Unlike DynamoDB, ScyllaDB managed to sustain the target throughput without any throttling and still deliver single-digit millisecond latencies. In that regard, ScyllaDB does not impose any hard limits on querying hot partitions. Even though ScyllaDB implements concurrency-limiting mechanisms, frequently accessing popular items will typically benefit from its cache implementation – thus explaining why no failures have been seen during the Zipfian tests. Even then, it is worth underscoring that hot partitions are an antipattern, even for databases like ScyllaDB. Read more about ScyllaDB’s advanced control mechanisms in ScyllaDB in this blog.

The Bottom Line

As the results indicate, what might begin at a seemingly reasonable cost can quickly escalate into “bill shock” with DynamoDB – especially as the throughput increases, and particularly with write-heavy workloads. This makes it a suboptimal choice for data-intensive applications anticipating steady or rapid growth. ScyllaDB’s significantly lower costs – a reflection of ScyllaDB taking full advantage of modern infrastructure for high throughput and low latency – make it a more cost-effective solution for data-intensive applications.

ScyllaDB – with its LSM-tree-based storage, unified caching, shard-per-core design, and advanced schedulers – allows you to maximize the advantages of modern hardware, from huge CPU chips to blazing-fast NVMe.

This small-scale benchmark demonstrated how a 57K OPS workload with a 50:50 read/write ratio that cost $4,500/month on-demand on ScyllaDB would cost $30,000 to $200,000/month with DynamoDB. Beyond those cost savings, ScyllaDB sustains 2X peaks and provides 2X-4X better P99 latency. Additionally, it can further reduce latency when idle – or enable spare resources to be shared across multiple tables. For larger workloads spanning 500K-1M OPS and beyond, this can result in a cost saving in the millions – with better performance and fewer query limitations.