1. Use AWS.DocumentClient instead of AWS.DynamoDB
When writing JS/TS code for DynamoDB, instead of using
AWS.DocumentClient. It’s much simpler to use and converts responses to native JS types.
2. Store Blobs in S3 instead of DynamoDB
While DynamoDB allows storing Blobs inside tables (e.g. pictures using Binary or base64 format), it's a much better practice to upload actual assets to S3 and store only links to them inside the database.
3. Use Promises instead of Callbacks
.promise(). This will make the library return a Promise which you can
await on. This will make your code much more elegant.
4. Use BatchGetItem API for querying multiple tables
Use BatchGetItem API to get up to 100 items identified by primary key from multiple DynamoDB tables at once in parallel instead of using Promise.all wrapping a series of calls. It’s much faster but can only return up to 16MB of data.
5. Use ‘AttributesToGet’ to make API responses faster
In order to make responses of your API faster, add
AttributesToGet parameter to get calls. This will return less data from DynamoDB table and potentially reduce overhead on data transport and un/marshalling.
6. Use 'convertEmptyValues: true' in DocumentClient
In JS SDK, use
convertEmptyValues: true in DocumentClient constructor to prevent silly validation exceptions by automatically converting falsy values to NULL types when persisting to DynamoDB.
7. Use DynamoDB.batchWriteItem for batch writes
DynamoDB.batchWriteItem calls to write up to 16MB of data or do up to 25 writes to multiple tables with a single API call. This will reduce overhead related to establishing HTTP connection.
8. Use IAM policies for security and enforcing best practices
You can use IAM policies to enforce best practices and restrict your developers and services from doing expensive Scan operations on DynamoDB tables.
9. Use DynamoDB Streams for data post-processing
Once data gets saved to the table, λ function subscribing to the stream can validate value, enrich information, or aggregate metrics. This will decouple your core business logic from side-effects. Learn more about DynamoDB Streams.
10. Use 'FilterExpressions'
Use FilterExpressions to refine and narrow your Query and Scan results on non-indexed fields.
11. Use 'Parallel Scan' to scan through big datasets
If you need to scan through big dataset fast, use "Parallel Scan" which divides your table into N segments and lets multiple processes to scan separate parts concurrently. Learn more about DynamoDB Parallel Scans
12. Set 'AWS_NODEJS_CONNECTION_REUSE_ENABLED' to 1
To reduce the overhead of connecting to DynamoDB, make sure the environment variable
AWS_NODEJS_CONNECTION_REUSE_ENABLED is set to 1 to make the SDK reuse connections by default.
13. Use 'TransactWriteItems' to update multiple records atomically
If you need to update multiple records atomically, use Transactions and
TransactWriteItems function to write up to 10 items atomically across tables at once. This ensures consistency.
14. Use Contributor Insights from Day 1
Use Contributor Insights from day-one to identify most accessed items and most throttled keys which might cause you performance problems.
15. Use VPC endpoints to make your connections more secure
Use VPC endpoints when using DynamoDB from a private subnet to make your connections more secure and remove the need for a public IP.
16. Always use 'ExpressionAttributeNames' in your queries
Because DynamoDB has over 500 reserved keywords, use
ExpressionAttributeNames always to prevent from
ValidationException - Invalid FilterExpression: Attribute name is a reserved keyword; reserved keyword: XXX
17. Use caching for frequently accessed items
Invest in caching early. Even the smallest caches can slice your DynamoDB bill by up to 80%. It works especially well for frequently accessed items.
18. Use On-Demand to identify traffic patterns
If unsure of your traffic patterns, use On-Demand capacity mode to scale your DynamoDB table ideally with the amount of read and write requests.
19. For Billing, start with On-Demand, then switch to Provisioned
20. Use 'DynamoDB Global Tables' for latency crucial applications
If latency to the end-user is crucial for your application, use DynamoDB Global Tables, which automatically replicate data across multiple regions. This way, your data is closer to the end-user. For the compute part, use Lambda@Edge functions.
21. Use 'createdAt' and 'updatedAt' attributes
updatedAt attributes to each item. Moreover, instead of removing records from the table, simply add
deletedAt attribute. It will not only make your delete operations reversible but also enable some auditing.
22. Aggregate, a lot
Instead of running expensive queries periodically for e.g. analytics purposes, use DynamoDB Streams connected to a Lambda function. It will update the result of an aggregation just-in-time whenever data changes.
23. Leverage write sharding
To avoid hot partitions and spread the load more evenly across them, make sure your partition keys have high cardinality. You can achieve that by adding a random number to the end of the partition key values.
24. Large attributes compression
If storing large blobs outside of DynamoDB, e.g. in S3 is not an option because of increased latency, use common compression algorithms like GZIP before saving them to DynamoDB.
25. Date formats
Use epoch date format (in seconds) if you want support DynamoDB TTL feature and all the filters. If you don't need TTL, you can also use ISO 8601 format. It works for filters like "between" too.
26. Use DynamoDB Local
If you need to mock your DynamoDB Table(s) for local development or tests on your machine, you can use Localstack or DynamoDB Local to run DynamoDB as Docker container without connectivity to the cloud. It supports most of the DynamoDB APIs.
27. Remember to backup
Before going live in production, remember to enable point-in-time backups, so there's an option to rollback your table in case of an error.
28. Starting to use DynamoDB? Try PartiQL
If you're just starting to use DynamoDB, you might be initially discouraged by its unconventional query syntax. If you want to use SQL-like query language to work with DynamoDB, you can do that using PartiQL.
29. Use S3 Export and Athena for full-table analytics
If you need to perform whole-table operations like
SELECT COUNT WHERE, export your table to S3 first and then use Athena or any other suitable tool to do so.
30. Use TTL to remove expired items
31. Avoid using Scans
DynamoDB has two primary methods of fetching collection of data: Query and Scan. While they might similar, you should always favor Queries over Scans
32. Distribute items across key space
DynamoDB relies heavily on the concept of partitions. It divides the key space into multiple partitions and distributes the items across them. It is extremely important to try to distribute reads and writes evenly across the key space and partitions because otherwuse one of your partitions might become "hot" and some operations might get throttled.
33. In a Single-Table designs, use "type" attribute
It allows you to quickly distinguish item types without relying on checking keys or asserting presence of attributes.
34. Use GSIs and LSIs to enable new access patterns
Instead of using Scans to fetch data that isn't easy to get using Queries, use Global Secondary Indexes to index the data on required fields This allows you to fetch data using fast Queries based on the attribute values of the item.
35. Use generic GSI and LSI names
"The only constant thing is change", especially in the software development. Requirements change frequently, and it is important to have a generic name for your indexes so you can avoid doing migrations as they change.
36. Use LeadingKeys for fine-grained IAM control
For fine-grained access control, e.g. in multi-tenant applications, use IAM Policy with
dynamodb:LeadingKeys condition. It will restrict access allowing users to only read/write the items where the partition key value matches.
37. Get around item size limit
DynamoDB Items are limited to 400KB. If your entities are bigger than that, consider denormalizing and splitting them into multiple rows sharing the same partition key. A useful mental model is to think of partition key as a directory/folder and sort key as a file name. Once you're in the correct folder, getting data from any file within that folder is pretty straightforward.
38. Leverage sort keys flexibility
With composite sort keys, you can define hierarchical relationships in your data. This way, you can query it at any level of the hierarchy and achieve multiple access patterns with just one field.
For example, with key structured like this:
begins_with clause you can get:
- All members of a company
- All members of a departament within a company
- All members of a team within departament
If you need to insert data with a condition, use
ConditionExpressions instead of Getting an item, checking its properties and then calling a Put operation. This way takes two calls, is more complicated and is not atomic.
40. Inverted Index
Inverted indexes are secondary indexes that are using hash key of your table as sort/range key of the index, and sort key of your table as partition/hash key of the index. This technique is useful e.g. in many-to-many relationships where you need to query the other side of the bi-directional relationship.
© 2021 Dynobase