DynamoDB vs Cassandra - The Ultimate Comparison
Written by Manusha Chethiyawardhana
Published on May 18th, 2022
Time to 10x your DynamoDB productivity with Dynobase [learn more]
Amazon DynamoDB and Apache Cassandra are two of the most powerful cloud-based scalable NoSQL database services that are widely used in many projects.
This article will discuss the similarities and differences between these two services to help you choose the best one for your project.
DynamoDB and Apache Cassandra: An Overview
Amazon DynamoDB is a fully managed, scalable, NoSQL database solution provided by AWS. If you need a key-value store with a dynamic schema and no infrastructure, DynamoDB is a great choice. In addition to that, it has some great features like high availability, scalability, security, and ensures millisecond range latency at any scale,
Apache Cassandra is an open-source, distributed NoSQL database management system. It is intended to handle large amounts of data across many commodity servers in an application-transparent manner while maintaining high availability and avoiding single points of failure. Cassandra will repartition itself as machines are added and removed from the cluster.
Shared Attributes for DynamoDB and Cassandra
Both DynamoDB and Apache Cassandra are well-known distributed data store technologies. Both are used in various applications and have proven their efficiency at any scale. Both databases can manage data without using a specific column schema. The concept of tables is still present, but there is no requirement for a specific set of columns, as there is with relational databases.
Both tables store data in sparse rows, which means that they only store the columns present in a given row. Each table has a primary key that uniquely identifies rows or items. The primary key consists of two parts. The partition key, which is required, determines the data placement by subsetting the table rows into partitions. Other is an optional key that sorts the rows within a partition. This is referred to as the clustering key in Cassandra and the sort key in DynamoDB.
Cassandra and DynamoDB both group and distribute data based on the partition key's hashed value.
Data Model DyanamoDB vs. Cassandra
Apache Cassandra is a column-oriented data store, whereas Amazon DynamoDB is a key-value and document-oriented store. Although DynamoDB can store a wide range of data types, Cassandra's list of supported data types is greater. For example, it contains data types such as counter, duration, inet, and varint.
Partition keys and sort keys in DynamoDB can only have one attribute. On the other hand, Cassandra allows for multiple attributes in partition keys and clustering keys, resulting in a composite key.
A DynamoDB partition is not the same as a Cassandra partition. It is a physical part of storage allocated for a specific chunk of a table in DynamoDB that can be up to 10 GB in size. Cassandra's partition is a set of rows in a column family that all have the same partition key and are thus stored on the same node.
Amazon DynamoDB is built for scalability and performance. Most DynamoDB response times are measured in single-digit milliseconds. DynamoDB can handle up to ten trillion requests per day and over twenty million requests per second.
DynamoDB Accelerator (DAX), an in-memory cache, can improve read performance tenfold, reducing the reading time from milliseconds to microseconds. In addition, DAX provides increased throughput and potential operational cost savings for read-heavy or bursty workloads by reducing the need to overprovision read capacity units.
Furthermore, maintaining a better table layout, effectively using keys and indexes for queries, and streamlining the database workload can improve the DynamoDB performance.
Cassandra's performance highly depends on how its data model is designed. If you properly design the data model, you can achieve better performance.
Cassandra's three data modeling 'dogmas' are as follows:
- Disk space is inexpensive.
- Writes are inexpensive.
- Network communication is costly.
Cassandra's performance benefits include nearly linear scalability, fast data writing, and nearly constant data availability. Downsides include data consistency issues that can arise from time to time and indexing that is far from perfect.
If you need to read a table with a larger number of columns, you may run into issues. Cassandra has partition size and number of values limits of 100 MB and 2 billion, respectively. So, to achieve better performance, you need to model the data to reduce the number of partition reads and distribute data evenly across the cluster.
Both DynamoDB and Cassandra support user authentication and data access authorization. For client and inter-node communication, both use encryption.
DynamoDB's security has been enhanced to include encryption at rest. Encryption at rest ensures the security of your data by encrypting it while it is on disk. DynamoDB also offers three options for encrypting tables using AWS KMS.
- DynamoDB's key is encrypted by default and is free of charge (AWS-owned key).
- KMS will secure the keys stored in your account for a fee (AWS-managed key).
- KMS Customer Management Keys (CMK) provide users with complete control over KMS keys while charging a fee (Customer-managed key).
Cassandra allows you to configure client-to-node and node-to-node encryption separately. When encryption is enabled, the JVM defaults for supported protocols, and cipher suites are used in both cases.
Cassandra's data access is role-based, and the smallest level of granularity is a row. It is more common in DyanamoDB to assign specific permissions and access keys to users using IAM policies. An attribute is the smallest level of access granularity in DyanamoDB.
You can learn more about DynamoDB Security from here!
Other Features Comparison
To meet your storage and throughput requirements, Amazon DynamoDB can scale the resources dedicated to a table to thousands of servers distributed across multiple availability zones, supporting virtually any table size. When workloads increase and decrease, it seamlessly scales based on the need. The high scalability does not affect performance, and you can choose between two scaling modes: on-demand scaling or provisioned capacity mode.
By allowing for the seamless addition of nodes, Apache Cassandra also meets the requirements of an ideal horizontally scalable system. When you need more capacity, you can add nodes to the cluster, and the cluster will automatically utilize the new resources. However, Cassandra's scalability is limited in comparison to DynamoDB.
Availability and Durability
Every piece of data in DynamoDB is replicated to multiple physical nodes located in different availability zones to ensure high durability and availability. If one copy fails, the others will help keep the operations running. Also, you can use the power of SSDs to improve the response time and latency.
Cassandra implements a fault-tolerant storage system to ensure data availability. A gossip protocol is used to detect a node's failure. Gossip is a peer-to-peer communication protocol in which nodes regularly exchange state information about themselves and other nodes they are aware of.
Similar to DynamoDB, Cassandra ensures data durability by using replicas on different cluster nodes. Replicas in a multi-datacenter environment may also be stored in different data centers.
Time To Live
Time To Live (TTL) is a mechanism that removes items from your table after a certain amount of time has passed.
TTL in DynamoDB is a timestamp value that represents the date and time at which the item expires. In comparison, Cassandra defines TTL as the number of seconds from when a row is created or updated until the row expires.
TTL is applied at the item level in DynamoDB, but it is applied at the column level in Cassandra.
When and Where to Pick Which Database Service
Use Cases for DynamoDB
DyanamoDB is a good option when you need to store and model structured and semi-structured data together.
DynamoDB is recommended for the Internet of Things (IoT), real-time bidding platforms, content management, recommendation engines, and gaming applications. The features of DynamoDB that we've discussed, such as high availability, low latency, and rapid scalability, can all help it perform well in these situations.
Amazon DynamoDB is also used by well-known companies like Netflix, Medium, Lyft, and Duolingo. So you shouldn't be afraid to use DyanamoDB.
Use Cases for Cassandra
Cassandra supports structured, semi-structured, and unstructured data storage.
Cassandra is useful for IoT, recommendation and personalization engines, fraud detection, messaging systems, etc. Cassandra's fast write and read operations, extremely low latency, and linear scalability makes it excellent for these applications.
Uber Technologies, Facebook, and Spotify all use Cassandra. Netflix also uses Cassandra, demonstrating that it is possible to build a powerful distributed database system using both of these databases.
DynamoDB core features are billed based on usage. Charges for additional services like backups will be calculated separately. When it comes to price calculation, DynamoDB's calculation of read and write units is quite complex. For example, DynamoDB considers eventually consistent read of an 8 KBs as a one read request unit and for strongly consistent reads requests, it considers 4KBs as a single read.
DynamoDB offers two pricing tiers:
On-demand capacity mode — DynamoDB scales your workload up or down as needed. The price of read and write units is determined dynamically by the amount of traffic received by the application.
Provisioned capacity mode — You must specify the number of read and write units required for your application in this mode. The price will then be calculated based on the read and write capacity units defined by the user.
In addition, DynamoDB free tier provides 25 GB of data storage, 2.5 million streams read requests, and 100 GB of data transfer out to the internet.
Cassandra is less expensive than DynamoDB for particular use cases like write-intensive workloads. It is possible to set up a Cassandra cluster for free. But, managing the cluster is very difficult, even with the necessary hardware.
As a solution, you can use managed Apache Cassandra instances. Managed Apache Cassandra provides four pricing models:
- Developer - 49 USD per node/month.
- Production - 250 USD per node/month.
- Enterprise - Custom pricing.
- IBM Cloud and On-Premises - Custom pricing.
This article compared and contrasted two of the most popular NoSQL database services, AWS DyanamoDB and Apache Cassandra. I hope you now have a firm grasp of them and their applications. Thank you for reading!