Close-to-the-metal architecture handles millions of OPS with predictable single-digit millisecond latencies.
Learn MoreClose-to-the-metal architecture handles millions of OPS with predictable single-digit millisecond latencies.
Learn MoreScyllaDB is purpose-built for data-intensive apps that require high throughput & predictable low latency.
Learn MoreLevel up your skills with our free NoSQL database courses.
Take a CourseOur blog keeps you up to date with recent news about the ScyllaDB NoSQL database and related technologies, success stories and developer how-tos.
Read MoreYou may be interested in moving to a scalable, reliable NoSQL database platform like ScyllaDB, but aren’t sure which interface to use: our Apache Cassandra® compatible Cassandra Query Language (CQL) interface or our newer Amazon DynamoDB® compatible interface. Or maybe you are familiar with one interface already and want to understand more about the other.
Bottom line:
While we generally recommend our CQL interface, let’s examine the details of each of these interfaces, their capabilities, and show which one is best for your use case.
SELECT time, value
FROM events
WHERE event_type = 'myEvent'
AND time > '2011-02-03'
AND time <= '2012-01-01'
However, CQL is definitely a NoSQL interface. It does not support all SQL features such as foreign keys and JOINs across tables. It also has some commands unique to itself. You can read our full documentation on CQL here.
ScyllaDB announced Project Alternator to provide an open source alternative to the popular Amazon DynamoDB database, designed around cost, performance, and openness. It utilizes the same underlying storage engine, but conforms to the unique differences of DynamoDB: its JSON query language, its key-value and document data models, as well as its RESTful API using an HTTP/HTTPS transport mechanism.
DynamoDB structures queries using a JavaScript Object Notation (JSON) document model. But don’t be confused: you can use JSON in conjunction with many programming languages, not just JavaScript. JSON’s text syntax facilitates structured data interchange across programming languages. Also note that JSON is not a complete implementation of data interchange. Instead, it is a form of data encapsulation.
DynamoDB queries and statements can get quite complex, including data types, conditions, expressions, filters, consistency levels and comparison operators. Here is an example of a simple query written in JSON:
{
TableName: "Table_X"
KeyConditionExpression: "LastName = :a",
ExpressionAttributeValues: {
:a = "smith"z
}
}
Amazon positions DynamoDB as a key-value and document database. While DynamoDB uses a JSON document model for its queries, it does not store data natively in JSON format. Data in DynamoDB is written in one table and is distributed across partitions, which enables it to scale horizontally. As a partition is filled, another one is added. As in CQL, these partitions are lists of items, sorted by a sort key (analogous to the CQL clustering key).
A very visible difference between the CQL and DynamoDB APIs is the transport protocol used to send requests and responses. CQL uses its own binary protocol for transporting SQL-like requests, while DynamoDB uses a JSON-formatted RESTful API for requests and responses over HTTP or HTTPS (secured via SSL/TLS).
DynamoDB’s use of popular standards — JSON and HTTP — has an obvious benefit: They are familiar to most users, and many programming languages have built-in tools for them. This, at least in theory, makes it easier to write client code for this protocol compared to CQL’s non-standard protocol, which requires special client libraries (a.k.a. “client drivers”). Moreover, the generality of JSON and HTTP also makes it easy to add more features to this API over time, unlike rigid binary protocols that need to change and further complicate the client libraries. In the last 8 years, the CQL binary protocol underwent four major changes (it is now in version 5), while the DynamoDB API added features gradually without changing the protocol at all.
However, closer inspection reveals that special client libraries are needed for DynamoDB as well. The problem is that beyond the standard JSON and HTTP, several other non-standard pieces are needed to fully utilize this API. Among other things, clients need to sign their requests using a non-standard signature process, and DynamoDB’s data types are encoded in JSON in non-obvious ways.
So DynamoDB users cannot really use JSON or HTTP directly. Just like CQL users, they also use a client library that Amazon or the AWS community provided. For example, the Python SDK for AWS (and for DynamoDB in particular) is called boto3. The bottom line is that both CQL and DynamoDB ecosystems have client libraries for many programming languages, with a slight numerical advantage to CQL. For users who use many AWS services, the ability to use the same client library to access many different services — not just the database — is a bonus. But we want to limit ourselves just to the database, and DynamoDB’s use of standardized protocols ends up being of no real advantage to the user.
We already noted above how the DynamoDB API uses the HTTP transport in the traditional way: Even if requests are sent one after another on the same connection (i.e., HTTP Keep-Alive), the requests are independent, signed independently and parsed independently.
In many intensive (and therefore, expensive) use cases, the client sends a huge number of similar requests with slightly different parameters. In the DynamoDB API, the server needs to parse these very similar requests again and again, wasting valuable CPU cycles.
The CQL query language offers a workaround: prepared statements. The user asks to prepare a parameterized statement — e.g., “insert into tb (key, val) values (?, ?)” with two parameters marked with question-marks. This statement is parsed only once, and further requests to the same node can reuse the pre-parsed statement, passing just the two missing values. The benefit of prepared statements are especially pronounced in small and quick requests, such as reads that can frequently be satisfied from cache, or writes. In this blog post we demonstrated in one case that using prepared statements can increase throughput by 50% and lower latency by 33%.
Amazon DynamoDB came out in 2012, two years after Cassandra. DynamoDB’s data model was inspired by Cassandra’s. In both DynamoDB and Cassandra, database rows are grouped together in a partition by a partition key, with the rows inside the partition sorted by a clustering key. The DynamoDB API and CQL use different names for these same concepts, though:
CQL term | DynamoDB term | |
Smallest addressable content | row | item |
Sub-entities | column | attribute |
Unit of distribution | partition (many rows) | partition (many items) |
Key for distribution | partition key | partition key (also hash key) |
Key within partition | clustering key | sort key (also range key) |
Despite this basic similarity in how the data is organized — and therefore in how data can be efficiently retrieved — there is a significant difference in how DynamoDB and CQL model writes. The two APIs offer different write primitives and make different guarantees on the performance of these primitives:
Cassandra is optimized for write performance. Importantly, ordinary writes do not require a read of the old item before the write: A write just records the change, and only read operations (and background operations known as compaction and repair) will later reconcile the final state of the row. This approach, known as Conflict-free Replicated Data Types (CRDT), has implications on Cassandra’s implementation, including its data replication and its on-disk storage in a set of immutable files called sstables (comprising an LSM). But more importantly for this post, the wish that ordinary writes should not read also shaped Cassandra’s API, CQL:
Contrasting with CQL’s approach, DynamoDB’s approach is that every write starts with a read of the old value of the item. In other words, every write is a read-modify-write operation. Again, this decision had implications on how DynamoDB was implemented. DynamoDB uses leader-based replication so that a single node (the leader for the item) can know the most recent value of an item and serialize write operations on it. DynamoDB stores data on disk using B-trees which allow reads together with writes. But again, for the purpose of this post, the interesting question is how making every write a read-modify-write operation shaped DynamoDB’s API:
This comparison demonstrates that the write features that the DynamoDB and CQL APIs provide are actually very similar: Much of what DynamoDB can do with its various read-modify-write operations can be done in CQL with the lightweight transactions (LWT) feature or by simple extensions of it. In fact, this is what ScyllaDB does internally: We implemented CQL’s LWT feature first, and then used the same implementation for the write operations of the DynamoDB API.
However, despite the similar write features, the two APIs are very different in their performance characteristics:
This requirement has led ScyllaDB’s implementation of the DynamoDB API to use the heavyweight (Paxos-based) LWT mechanism for every write, even simple ones that do not require reads, making them significantly slower than the same write performed via the CQL API. We will likely optimize this implementation in the future, perhaps by adopting a leader model (like DynamoDB), but an important observation is that in every implementation of the DynamoDB API, writes will be significantly more expensive than reads. Amazon prices write operations at least 5 times more expensive than reads (this ratio reaches 40 for weakly-consistent reads of 4 KB items).
Amazon prices a write unit (WCU) at 5 times the cost of a read unit (RCU). Additionally, for large items, each 1 KB is counted as a write unit, while a read unit is 4 KB. Moreover, eventually-consistent reads are counted as half a read unit. So writing a large item is 40 (=5*4*2) times more expensive than reading it with an eventually-consistent read.
ScyllaDB provides users with an option to do DynamoDB-API write-only updates (updates without a condition, update expression, etc.) efficiently, without LWT. However, this option is not officially part of the DynamoDB API, and as explained above — cannot be made safe without adding an assumption that write-only updates and read-modify-write updates to the same item cannot be mixed.
Above we noted that both APIs — DynamoDB and CQL — model their data in the same way: as a set of partitions indexed by a partition key, and each partition is a list of items sorted by a sort key.
Nevertheless, the way that the two APIs model data inside each item is distinctly different:
It might appear, therefore, that DynamoDB’s schema-less data model is more general and more powerful than CQL’s schema-full data model. But this isn’t quite the case. When discussing CRDT above, we already mentioned that CQL supports collection columns. In particular, a CQL table can have a map column, and an application can store in that map any attributes that aren’t part of its usual schema. CQL’s map does have one limitation, though: The values stored in a map must all have the same type. But even this limitation can be circumvented, by serializing values of different types into one common format such as a byte array. This is exactly how ScyllaDB implemented the DynamoDB API’s schema-less items on top of its existing schema-full CQL implementation.
So it turns out that the different approaches that the two APIs took to schemas have no real effect on the power or the generality of the two APIs. Still, this difference does influence what is more convenient to do in each API:
We surveyed some of the major differences between the Amazon DynamoDB API and Apache Cassandra’s CQL API — both of which are now supported in one open-source database, ScyllaDB (The Fastest NoSQL Database). Despite a similar data model and design heritage, these two APIs differ in the query language, in the protocol used to transport requests and responses, and somewhat even in the data model (e.g., schema-full vs. schema-less items).
We showed that in general, it is easier to write client libraries for the DynamoDB API because it is based on standardized building blocks such as JSON and HTTP. Nevertheless, we noted that this difference does not translate to a real benefit to users because such client libraries already exist for both APIs, in many programming languages. But the flip side is that the simplicity of the DynamoDB API costs in performance: The DynamoDB API requires extra network hops (and therefore increased costs and latency) because clients do not need to be aware of which data belongs to which node. The DynamDB API also requires more protocol and parsing work and more TCP connections.
We discussed some possible directions in which the DynamoDB API could be improved (such as adopting HTTP/2 and a new mechanism for prepared statements) to increase its performance. We don’t know when, if ever, such improvements might reach Amazon DynamoDB. But as an experiment, we could implement these improvements in the open-source ScyllaDB and measure what kinds of benefits an improved API might bring. When we do, look forward to a follow-up on this post detailing these improvements.
We also noted the difference in how the two APIs handle write operations. The DynamoDB API doesn’t make a distinction between writes that need the prior value of the item and writes that do not. This makes it easier to write consistent applications with this API, but also means that writes with the DynamoDB API are always significantly more expensive than reads. Conversely, the CQL API does make this distinction — and does forbid mixing the two types of writes. Therefore, the CQL API allows write-only updates to be optimized, while the read-modify-write updates remain slower. For some applications, this can be a big performance boost.
The final arbiter in weighing these APIs is subjective in answer to the question: “Which will be better for your use case?” Due to many of the points made above, and due to other features which at this point are only available through ScyllaDB’s CQL implementation, ScyllaDB currently advises users to adopt the CQL interface for all use cases apart from “lift and shift” migrations from DynamoDB.
Looking for extensive training on data modeling, database migration, and high performance for NoSQL Databases? Our experts offer 3-hour masterclasses that assists practitioners wanting to migrate from SQL to NoSQL or advance their understanding of NoSQL data modeling. This free, self-paced class covers techniques and best practices on these NoSQL concepts that will help you steer clear of mistakes that could inconvenience any engineering team.
You can access the complete course here.
Now that you’ve learned the differences, it’s time to try your workloads on ScyllaDB to realize the performance benefits it can offer. Here’s some articles for how to implement your migration:
Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.