40 DynamoDB Best Practices [Bite-sized Tips]
Written by Rafal Wilinski
Published on March 30th, 2020
Time to 10x your DynamoDB productivity with Dynobase [learn more]
1. Use AWS.DocumentClient instead of AWS.DynamoDB
When writing JS/TS code for DynamoDB, instead of using AWS.DynamoDB
, use AWS.DocumentClient
. It’s much simpler to use and converts responses to native JS types, making it easier to work with JSON data.
2. Store Blobs in S3 instead of DynamoDB
While DynamoDB allows storing Blobs inside tables (e.g., pictures using Binary or base64 format), it's a much better practice to upload actual assets to S3 and store only links to them inside the database. This approach reduces the size of your DynamoDB items and leverages S3's optimized storage for large objects.
3. Use Promises instead of Callbacks
When working with DynamoDB, instead of using JavaScript callbacks, end each call with .promise()
. This will make the library return a Promise which you can .then()
or await
on. This will make your code much more elegant and easier to read, especially when dealing with asynchronous operations.
4. Use BatchGetItem API for querying multiple tables
Use BatchGetItem API to get up to 100 items identified by primary key from multiple DynamoDB tables at once in parallel instead of using Promise.all
wrapping a series of calls. It’s much faster but can only return up to 16MB of data.
5. Use ‘AttributesToGet’ to make API responses faster
In order to make responses of your API faster, add the AttributesToGet
parameter to get calls. This will return less data from the DynamoDB table and potentially reduce overhead on data transport and unmarshalling.
6. Use 'convertEmptyValues: true' in DocumentClient
In the JS SDK, use convertEmptyValues: true
in the DocumentClient constructor to prevent validation exceptions by automatically converting empty strings and other falsy values to NULL types when persisting to DynamoDB.
7. Use DynamoDB.batchWriteItem for batch writes
Use DynamoDB.batchWriteItem
calls to write up to 16MB of data or perform up to 25 writes to multiple tables with a single API call. This will reduce overhead related to establishing HTTP connections and improve write efficiency.
8. Use IAM policies for security and enforcing best practices
You can use IAM policies to enforce best practices and restrict your developers and services from performing expensive Scan operations on DynamoDB tables. This helps in maintaining cost efficiency and performance.
9. Use DynamoDB Streams for data post-processing
Once data gets saved to the table, a Lambda function subscribing to the stream can validate values, enrich information, or aggregate metrics. This will decouple your core business logic from side-effects. Learn more about DynamoDB Streams.
10. Use 'FilterExpressions'
Use FilterExpressions to refine and narrow your Query and Scan results on non-indexed fields. This helps in reducing the amount of data returned and improves performance.
11. Use 'Parallel Scan' to scan through big datasets
If you need to scan through a big dataset fast, use "Parallel Scan" which divides your table into N segments and lets multiple processes scan separate parts concurrently. Learn more about DynamoDB Parallel Scans.
12. Set 'AWS_NODEJS_CONNECTION_REUSE_ENABLED' to 1
To reduce the overhead of connecting to DynamoDB, make sure the environment variable AWS_NODEJS_CONNECTION_REUSE_ENABLED
is set to 1 to make the SDK reuse connections by default. This can significantly improve performance in high-throughput applications.
13. Use 'TransactWriteItems' to update multiple records atomically
If you need to update multiple records atomically, use Transactions and the TransactWriteItems
function to write up to 10 items atomically across tables at once. This ensures consistency and integrity of your data.
14. Use Contributor Insights from Day 1
Use Contributor Insights from day one to identify the most accessed items and most throttled keys which might cause you performance problems. This proactive approach helps in optimizing your table design and access patterns.
15. Use VPC endpoints to make your connections more secure
Use VPC endpoints when using DynamoDB from a private subnet to make your connections more secure and remove the need for a public IP. This enhances security by keeping traffic within the AWS network.
16. Always use 'ExpressionAttributeNames' in your queries
Because DynamoDB has over 500 reserved keywords, use ExpressionAttributeNames
always to prevent ValidationException - Invalid FilterExpression: Attribute name is a reserved keyword; reserved keyword: XXX
. This ensures your queries are robust and error-free.
17. Use caching for frequently accessed items
Invest in caching early. Even the smallest caches can reduce your DynamoDB bill by up to 80%. It works especially well for frequently accessed items, reducing the load on your DynamoDB tables and improving response times.
18. Use On-Demand to identify traffic patterns
If unsure of your traffic patterns, use On-Demand capacity mode to scale your DynamoDB table ideally with the amount of read and write requests. This mode automatically adjusts capacity based on your application's needs.
19. For Billing, start with On-Demand, then switch to Provisioned
Use On-Demand capacity mode to identify your traffic patterns. Once discovered, switch to provisioned mode with autoscaling enabled to save money. This approach allows you to optimize costs while ensuring performance.
20. Use 'DynamoDB Global Tables' for latency crucial applications
If latency to the end-user is crucial for your application, use DynamoDB Global Tables, which automatically replicate data across multiple regions. This way, your data is closer to the end-user. For the compute part, use Lambda@Edge functions to further reduce latency.
21. Use 'createdAt' and 'updatedAt' attributes
Add createdAt
and updatedAt
attributes to each item. Moreover, instead of removing records from the table, simply add a deletedAt
attribute. It will not only make your delete operations reversible but also enable some auditing and historical data analysis.
22. Aggregate, a lot
Instead of running expensive queries periodically for analytics purposes, use DynamoDB Streams connected to a Lambda function. It will update the result of an aggregation just-in-time whenever data changes, ensuring real-time analytics.
23. Leverage write sharding
To avoid hot partitions and spread the load more evenly across them, make sure your partition keys have high cardinality. You can achieve that by adding a random number to the end of the partition key values. This technique helps in distributing write operations more evenly.
24. Large attributes compression
If storing large blobs outside of DynamoDB, e.g., in S3 is not an option because of increased latency, use common compression algorithms like GZIP before saving them to DynamoDB. This reduces the storage size and can improve read/write performance.
25. Date formats
Use epoch date format (in seconds) if you want to support DynamoDB TTL feature and all the filters. If you don't need TTL, you can also use ISO 8601 format. It works for filters like "between" too and is more human-readable.
26. Use DynamoDB Local
If you need to mock your DynamoDB Table(s) for local development or tests on your machine, you can use Localstack or DynamoDB Local to run DynamoDB as a Docker container without connectivity to the cloud. It supports most of the DynamoDB APIs and can significantly speed up your development cycle.
27. Remember to backup
Before going live in production, remember to enable point-in-time backups, so there's an option to rollback your table in case of an error. Regular backups are crucial for disaster recovery and data integrity.
28. Starting to use DynamoDB? Try PartiQL
If you're just starting to use DynamoDB, you might be initially discouraged by its unconventional query syntax. If you want to use a SQL-like query language to work with DynamoDB, you can do that using PartiQL. This can make the transition easier for those familiar with SQL.
29. Use S3 Export and Athena for full-table analytics
If you need to perform whole-table operations like SELECT COUNT WHERE
, export your table to S3 first and then use Athena or any other suitable tool to do so. This approach leverages the power of S3 and Athena for large-scale data analysis.
30. Use TTL to remove expired items
TTL (Time-to-live) allows you not only to remove expired items from your table for free but also reduces the overall cost of your tables! Learn more about TTL in DynamoDB here.
31. Avoid using Scans
DynamoDB has two primary methods of fetching collections of data: Query and Scan. While they might seem similar, you should always favor Queries over Scans. Queries are more efficient and cost-effective.
32. Distribute items across key space
DynamoDB relies heavily on the concept of partitions. It divides the key space into multiple partitions and distributes the items across them. It is extremely important to try to distribute reads and writes evenly across the key space and partitions because otherwise, one of your partitions might become "hot" and some operations might get throttled.
33. In a Single-Table designs, use "type" attribute
It allows you to quickly distinguish item types without relying on checking keys or asserting the presence of attributes. This simplifies your data model and makes your queries more efficient.
34. Use GSIs and LSIs to enable new access patterns
Instead of using Scans to fetch data that isn't easy to get using Queries, use Global Secondary Indexes to index the data on required fields. This allows you to fetch data using fast Queries based on the attribute values of the item.
35. Use generic GSI and LSI names
"The only constant thing is change," especially in software development. Requirements change frequently, and it is important to have a generic name for your indexes so you can avoid doing migrations as they change. This flexibility can save significant time and effort in the long run.
36. Use LeadingKeys for fine-grained IAM control
For fine-grained access control, e.g., in multi-tenant applications, use IAM Policy with dynamodb:LeadingKeys
condition. It will restrict access, allowing users to only read/write the items where the partition key value matches. This ensures data isolation and security.
37. Get around item size limit
DynamoDB Items are limited to 400KB. If your entities are bigger than that, consider denormalizing and splitting them into multiple rows sharing the same partition key. A useful mental model is to think of the partition key as a directory/folder and the sort key as a file name. Once you're in the correct folder, getting data from any file within that folder is pretty straightforward.
38. Leverage sort keys flexibility
With composite sort keys, you can define hierarchical relationships in your data. This way, you can query it at any level of the hierarchy and achieve multiple access patterns with just one field.
For example, with a key structured like this: [company]#[department]#[team]
and a begins_with
clause you can get:
- All members of a company
- All members of a department within a company
- All members of a team within a department
39. Use ConditionExpressions
If you need to insert data with a condition, use ConditionExpressions
instead of getting an item, checking its properties, and then calling a Put operation. This way takes two calls, is more complicated, and is not atomic. ConditionExpressions
ensure that your operations are atomic and consistent.
40. Inverted Index
Inverted indexes are secondary indexes that use the hash key of your table as the sort/range key of the index, and the sort key of your table as the partition/hash key of the index. This technique is useful, e.g., in many-to-many relationships where you need to query the other side of the bi-directional relationship.
41. Monitor and Optimize Costs
Regularly monitor your DynamoDB usage and costs using AWS Cost Explorer and CloudWatch. Identify any unexpected spikes in usage or costs and optimize your table configurations accordingly. Consider using reserved capacity if you have predictable workloads to save on costs.