dynobase-icon
Dynobase

40 DynamoDB Best Practices [Bite-sized Tips]

Rafal Wilinski

Written by Rafal Wilinski

Published on March 30th, 2020

    Still using AWS console to work with DynamoDB? 🙈

    Time to 10x your DynamoDB productivity with Dynobase [learn more]

    1. Use AWS.DocumentClient instead of AWS.DynamoDB

    When writing JS/TS code for DynamoDB, instead of using AWS.DynamoDB, use AWS.DocumentClient. It’s much simpler to use and converts responses to native JS types.

    2. Store Blobs in S3 instead of DynamoDB

    While DynamoDB allows storing Blobs inside tables (e.g. pictures using Binary or base64 format), it's a much better practice to upload actual assets to S3 and store only links to them inside the database.

    3. Use Promises instead of Callbacks

    When working with DynamoDB, instead of using Javascript callbacks, end each call with .promise(). This will make the library return a Promise which you can .then() or await on. This will make your code much more elegant.

    4. Use BatchGetItem API for querying multiple tables

    Use BatchGetItem API to get up to 100 items identified by primary key from multiple DynamoDB tables at once in parallel instead of using Promise.all wrapping a series of calls. It’s much faster but can only return up to 16MB of data.

    5. Use ‘AttributesToGet’ to make API responses faster

    In order to make responses of your API faster, add AttributesToGet parameter to get calls. This will return less data from DynamoDB table and potentially reduce overhead on data transport and un/marshalling.

    6. Use 'convertEmptyValues: true' in DocumentClient

    In JS SDK, use convertEmptyValues: true in DocumentClient constructor to prevent silly validation exceptions by automatically converting falsy values to NULL types when persisting to DynamoDB.

    7. Use DynamoDB.batchWriteItem for batch writes

    Use DynamoDB.batchWriteItem calls to write up to 16MB of data or do up to 25 writes to multiple tables with a single API call. This will reduce overhead related to establishing HTTP connection.

    8. Use IAM policies for security and enforcing best practices

    You can use IAM policies to enforce best practices and restrict your developers and services from doing expensive Scan operations on DynamoDB tables.

    9. Use DynamoDB Streams for data post-processing

    Once data gets saved to the table, λ function subscribing to the stream can validate value, enrich information, or aggregate metrics. This will decouple your core business logic from side-effects. Learn more about DynamoDB Streams.

    10. Use 'FilterExpressions'

    Use FilterExpressions to refine and narrow your Query and Scan results on non-indexed fields.

    11. Use 'Parallel Scan' to scan through big datasets

    If you need to scan through big dataset fast, use "Parallel Scan" which divides your table into N segments and lets multiple processes to scan separate parts concurrently. Learn more about DynamoDB Parallel Scans

    12. Set 'AWS_NODEJS_CONNECTION_REUSE_ENABLED' to 1

    To reduce the overhead of connecting to DynamoDB, make sure the environment variable AWS_NODEJS_CONNECTION_REUSE_ENABLED is set to 1 to make the SDK reuse connections by default.

    13. Use 'TransactWriteItems' to update multiple records atomically

    If you need to update multiple records atomically, use Transactions and TransactWriteItems function to write up to 10 items atomically across tables at once. This ensures consistency.

    14. Use Contributor Insights from Day 1

    Use Contributor Insights from day-one to identify most accessed items and most throttled keys which might cause you performance problems.

    15. Use VPC endpoints to make your connections more secure

    Use VPC endpoints when using DynamoDB from a private subnet to make your connections more secure and remove the need for a public IP.

    16. Always use 'ExpressionAttributeNames' in your queries

    Because DynamoDB has over 500 reserved keywords, use ExpressionAttributeNames always to prevent from ValidationException - Invalid FilterExpression: Attribute name is a reserved keyword; reserved keyword: XXX

    17. Use caching for frequently accessed items

    Invest in caching early. Even the smallest caches can slice your DynamoDB bill by up to 80%. It works especially well for frequently accessed items.

    18. Use On-Demand to identify traffic patterns

    If unsure of your traffic patterns, use On-Demand capacity mode to scale your DynamoDB table ideally with the amount of read and write requests.

    19. For Billing, start with On-Demand, then switch to Provisioned

    Use On-Demand capacity mode to identify your traffic patterns. Once discovered, switch to provisioned mode with autoscaling enabled to save money.

    20. Use 'DynamoDB Global Tables' for latency crucial applications

    If latency to the end-user is crucial for your application, use DynamoDB Global Tables, which automatically replicate data across multiple regions. This way, your data is closer to the end-user. For the compute part, use Lambda@Edge functions.

    21. Use 'createdAt' and 'updatedAt' attributes

    Add createdAt and updatedAt attributes to each item. Moreover, instead of removing records from the table, simply add deletedAt attribute. It will not only make your delete operations reversible but also enable some auditing.

    22. Aggregate, a lot

    Instead of running expensive queries periodically for e.g. analytics purposes, use DynamoDB Streams connected to a Lambda function. It will update the result of an aggregation just-in-time whenever data changes.

    23. Leverage write sharding

    To avoid hot partitions and spread the load more evenly across them, make sure your partition keys have high cardinality. You can achieve that by adding a random number to the end of the partition key values.

    24. Large attributes compression

    If storing large blobs outside of DynamoDB, e.g. in S3 is not an option because of increased latency, use common compression algorithms like GZIP before saving them to DynamoDB.

    25. Date formats

    Use epoch date format (in seconds) if you want support DynamoDB TTL feature and all the filters. If you don't need TTL, you can also use ISO 8601 format. It works for filters like "between" too.

    26. Use DynamoDB Local

    If you need to mock your DynamoDB Table(s) for local development or tests on your machine, you can use Localstack or DynamoDB Local to run DynamoDB as Docker container without connectivity to the cloud. It supports most of the DynamoDB APIs.

    27. Remember to backup

    Before going live in production, remember to enable point-in-time backups, so there's an option to rollback your table in case of an error.

    28. Starting to use DynamoDB? Try PartiQL

    If you're just starting to use DynamoDB, you might be initially discouraged by its unconventional query syntax. If you want to use SQL-like query language to work with DynamoDB, you can do that using PartiQL.

    29. Use S3 Export and Athena for full-table analytics

    If you need to perform whole-table operations like SELECT COUNT WHERE, export your table to S3 first and then use Athena or any other suitable tool to do so.

    30. Use TTL to remove expired items

    TTL (Time-to-live) allows you not only to remove expired items from your table for free, but also reduces the overall cost of your tables! Learn more about TTL in DynamoDB here

    31. Avoid using Scans

    DynamoDB has two primary methods of fetching collection of data: Query and Scan. While they might similar, you should always favor Queries over Scans

    32. Distribute items across key space

    DynamoDB relies heavily on the concept of partitions. It divides the key space into multiple partitions and distributes the items across them. It is extremely important to try to distribute reads and writes evenly across the key space and partitions because otherwuse one of your partitions might become "hot" and some operations might get throttled.

    33. In a Single-Table designs, use "type" attribute

    It allows you to quickly distinguish item types without relying on checking keys or asserting presence of attributes.

    34. Use GSIs and LSIs to enable new access patterns

    Instead of using Scans to fetch data that isn't easy to get using Queries, use Global Secondary Indexes to index the data on required fields This allows you to fetch data using fast Queries based on the attribute values of the item.

    35. Use generic GSI and LSI names

    "The only constant thing is change", especially in the software development. Requirements change frequently, and it is important to have a generic name for your indexes so you can avoid doing migrations as they change.

    36. Use LeadingKeys for fine-grained IAM control

    For fine-grained access control, e.g. in multi-tenant applications, use IAM Policy with dynamodb:LeadingKeys condition. It will restrict access allowing users to only read/write the items where the partition key value matches.

    37. Get around item size limit

    DynamoDB Items are limited to 400KB. If your entities are bigger than that, consider denormalizing and splitting them into multiple rows sharing the same partition key. A useful mental model is to think of partition key as a directory/folder and sort key as a file name. Once you're in the correct folder, getting data from any file within that folder is pretty straightforward.

    38. Leverage sort keys flexibility

    With composite sort keys, you can define hierarchical relationships in your data. This way, you can query it at any level of the hierarchy and achieve multiple access patterns with just one field.

    For example, with key structured like this: [company]#[departament]#[team] and begins_with clause you can get:

    • All members of a company
    • All members of a departament within a company
    • All members of a team within departament

    39. Use ConditionExpressions

    If you need to insert data with a condition, use ConditionExpressions instead of Getting an item, checking its properties and then calling a Put operation. This way takes two calls, is more complicated and is not atomic.

    40. Inverted Index

    Inverted indexes are secondary indexes that are using hash key of your table as sort/range key of the index, and sort key of your table as partition/hash key of the index. This technique is useful e.g. in many-to-many relationships where you need to query the other side of the bi-directional relationship.

    Dynobase is a Professional GUI Client for DynamoDB

    Try 7-day free trial. No credit card needed.

    Product Features

    Download
    /
    Changelog
    /
    Pricing
    /
    Member Portal
    /
    Privacy
    /
    EULA
    /
    Twitter
    © 2024 Dynobase