DynamoDB Indexes Explained [Global & Local Secondary]
Written by Rafal Wilinski
Published on June 16th, 2020
- DynamoDB Local Secondary Index (LSI)
- DynamoDB Global Secondary Index (GSI)
- Sparse index
- Inverted index
Time to 10x your DynamoDB productivity with Dynobase [learn more]
What makes DynamoDB so much more than just a simple Key-Value store is the secondary indexes. They allow you to quickly query and lookup items based on not only the primary index attributes, but also attributes of your choice. Secondary Indexes, unlike primary keys, are not required, and they don't have to be unique. Generally speaking, they allow much more flexible query access patterns.
Different DynamoDB Indexes
There are also a few flavors of "indexes":
DynamoDB Local Secondary Index (LSI)
Local Secondary Indexes use the same hash key as the primary index but allow you to use a different sort key. That also means that they can be created only on tables with composite primary key.
Additionally, LSIs:
- Limit you to only 10GB of data per Hash/Partition Key.
- Unlike GSIs, they share throughput with the base table - if you query for data using LSI, the usage will be calculated against the capacity of the underlying table and its base index.
- They have to be specified at table creation - you can't add or remove them after provisioning the table.
DynamoDB Global Secondary Index (GSI)
Global Secondary Indexes are much more flexible and don't have LSI limitations. They don't have to share the same hash key, and they can also be created on a table with a simple key schema. Most of the time, you'll find yourself using GSIs over LSIs because they enable much more flexible query access patterns.
Moreover, GSIs:
- Don't have that 10GB of data per Hash/Partition Key limit like LSIs have.
- Global Secondary Indexes don't have to be unique! Two items can have the same partition and sort key pair on a GSI.
- They don't share throughput with the base table. Each of the GSIs is billed independently, and throttling is also separated as a consequence.
- GSIs can be altered after the table has been created. You can add and delete them whenever you want. Moreover, if you create a GSI on an attribute already defined in a collection of items, the index will also be backfilled with that data.
- They are eventually consistent - if you're writing an item to a table, it is asynchronously propagated to the rest of GSIs. It means that query results might sometimes not be consistent and you should be aware of that when creating your application. However, if you want to get the most up-to-date data, you can use "Strongly Consistent Reads". They cost x2 more, they might have a little bit bigger latency and you cannot use them on GSIs but they reflect the most accurate state of the database.
Sparse index
Sparse Index is a special type of GSI that allows you to index only a subset of the collection by indexing an attribute that is not present on all the items. This technique is useful to quickly query for a set of items that have a specific attribute value, e.g. only rows that have an attribute deletedAt
defined. To create a sparse index, make sure that only items that should be in that index have a sort key value of that index present.
Sparse Indexes are a great replacement for queries with FilterExpressions
that include a contains
operator.
Inverted index
Inverted Index is a GSI that is basically a primary key but inversed - table's hash key becomes inverted index's sort key and table's sort key becomes inverted index's hash key.
Inverted Indexes are helpful in bidirectional many-to-many relationships, e.g. defining connections between cities or friendships between users.
Frequently Asked Questions
What are the GSI/LSI use cases?
Global/Local secondary indexes are useful if you need to query for some data fast, and that query pattern is not possible using the table's main index. Examples include:
- Table of users where partition key is
uuid
and sort key iscreatedAt
but you also want to have an ability to find a user by email -> Add GSI with Partition Keyemail
- Table of products where partition key is
productId
and sort key iscategory
but you also want to be able to find all products within one category. In such case, GSI can be an inverted table index -category
will become GSI's partition key andproductId
will become a sort key - Metadata table and you want to be able to find all softly deleted items. You can provision a sparse index which uses
deletedAt
as a sort key and issue queries where sort key exists
What is the difference between primary index and secondary index in DynamoDB?
The key difference is that the primary index has to exist while the secondary indexes are optional. Primary index uniquely identifies each of the items in the table while secondary indexes only allow you to query them using different attributes.
Does DynamoDB have index based throttle?
Yes and no. Because GSIs have their own and separate throughput, the throttle on these is also independent of the base index. On the other hand, LSIs don't have index-based throttling. They share both throughput and throttles with other LSIs and base index.
How many secondary indexes can you create per table in DynamoDB?
You can create up to 20 global secondary indexes and 5 local secondary indexes per table.
What is the DynamoDB index cost?
Setting up DynamoDB indexes is free of charge. However, when you're writing data to the table and index is affected by this, it will consume provisioned WCUs for this operation. This means that if you're writing an item to the table with 4 GSIs and that item has 4 attributes which are indexed by these GSIs, you're going to pay x4 for this. For more writing scenarios, head to AWS docs. Moreover, there are also storage considerations: 100 bytes of overhead per index item, size in bytes of the projected attributes (if any) and size of the index key attribute.
How can I read data from indexes?
Unfortunately, you cannot use GetItem
and BatchGetItem
operations on GSIs. However, you can still use Queries and Scans on them.
Best Practices for Using DynamoDB Indexes
When designing your DynamoDB tables and indexes, consider the following best practices:
- Plan your access patterns: Before creating your table, think about how you will query your data. This will help you decide which attributes to index.
- Use sparse indexes wisely: Sparse indexes can save storage and improve query performance by indexing only a subset of items.
- Monitor and adjust throughput: Regularly monitor your table and index usage to ensure you have provisioned the appropriate throughput. Adjust as necessary to avoid throttling.
- Leverage GSIs for flexibility: Use GSIs to enable flexible query patterns that your primary index does not support.
- Understand eventual consistency: Be aware that GSIs are eventually consistent, which means there might be a delay before changes are reflected in the index. Use strongly consistent reads if immediate consistency is required, but note that they are not available for GSIs.