DynamoDB Indexes Explained [Global & Local Secondary]
Written by Rafal Wilinski
Published on 2020-06-16
What makes DynamoDB so much more than just a simple Key-Value store is the secondary indexes. They allow you to quickly query and lookup items based on not only the primary index attributes, but also attributes of your choice. Secondary Indexes, unlike primary keys, are not required, and they don't have to be unique. Generally speaking, they allow much more flexible query access patterns.
Different DynamoDB Indexes
There are also a few flavors of "indexes":
Local Secondary Indexes use the same hash key as the primary index but allow you to use a different sort key. That also means that they can be created only on tables with composite primary key.
- Limit you to only 10GB of data per Hash/Partition Key.
- Unlike GSIs, they share throughput with base table - if you query for data using LSI, the usage will be calculated against capacity of the underlying table and its base index.
- They have to be specified at table creation - you can't add or remove them after provisioning the table.
Global Secondary Keys are much more flexible and don't have LSI limitations. They don't have to share the same hash key, and they can also be created on a table with a simple key schema. Most of the time, you'll find yourself using GSIs over LSIs because they enable much more flexible query access patterns.
- Don't have that 10GB of data per Hash/Partition Key limit like LSIs have.
- Global Secondary Indexes don't have to be unique! Two items can have the same partition and sortkey pair on a GSI.
- They don't share throughput with the base table. Each of the GSIs is billed independently, and throttling is also separated as a consequence.
- GSIs can be altered after the table has been created. You can add and delete them whenever you want. Moreover, if you create GSI on the attribute already defined in a collection of items, the index will also be backfilled with that data.
- They are eventually consistent - if you're writing an item to a table, it is asynchronously propagated to the rest of GSIs. It means that query results might sometimes not be consistent and you should be aware of that when creating your application. However, if you want to get the most up-to-date data, you can use "Strongly Consistent Reads". They cost x2 more, they might have a little bit bigger latency and you cannot use them on GSIs but they are reflecting the most accurate state of the database.
Sparse Index is a special type of GSI that allows you to index only a subset of the collection by indexing an attribute that is not present on all the items. This technique is useful to quickly query for a set of items that have a specific attribute value, e.g. only rows that have an attribute
deletedAt defined. To create a sparse index, make sure that only items that should be in that index have a sort key value of that index present.
Sparse Indexes are great replacement for queries with
FilterExpressions that include a
Inverted Index is a GSI that is basically a primary key but inversed - table's hash key becomes inverted index's sort key and table's sort key becomes inverted index's hash key.
Inverted Indexes are helpful in bidirectional many-to-many relationships, e.g. defining connections between cities or friendships between users.
Frequently Asked Questions
What are the GSI/LSI use cases?
Global/Local secondary indexes are useful if you need to query for some data fast, and that query pattern is not possible using table's main index. Examples include:
- Table of users where partition key is
uuidand sort key is
createdAtbut you also want to have an ability to find a user by email -> Add GSI with Partition Key
- Table of products where partition key is
productIdand sort key is
categorybut you also want to be able to find all products within one category. In such case, GSI can be an inverted table index -
categorywill become GSI's partition key and
productIdwill become a sortkey
- Metadata table and you want to be able to find all softly deleted items. You can provision a sparse index which uses
deletedAtas a sort key and issue queries where sort key exists
What is the difference between primary index and secondary index in DynamoDB?
The key difference is that the primary index has to exist while the secondary indexes are optional. Primary index uniquely identifies each of the items in the table while secondary indexes only allow you to query them using different attributes.
Does DynamoDB have index based throttle?
Yes and no. Because GSIs have their own and separate throughput, the throttle on these is also independent of the base index. On the other hand, LSIs don't have index-based throttling. They share both throughput and throttles with other LSIs and base index.
How many secondary indexes can you create per table in DynamoDB?
You can only create 20 global secondary indexes and 5 local secondary indexes per table.
What is the DynamoDB index cost?
Settings DynamoDB indexes is free of charge. However, when you're writing data to the table and index is affected by this, it will consume provisioned WCUs for this operation. This means that if you're writing an item to the table with 4 GSIs and that item has 4 attributes which are indexes by these GSIs, you're going pay x4 for this. For more writing scenarios, head to AWS docs. Moreover, there are also storage considerations: 100 bytes of overhead per index item, size in bytes of the projected attributes (if any) and size of the index key attribute.
How can I read data from indexes?
Unfortunately, you cannot use
GetBatchItem operations on GSIs. However, you can still use Queries and Scans on them.
© 2021 Dynobase