What Is DynamoDB Scan?
Scan is one of the three ways of getting the data from DynamoDB, and it is the most brutal one because it grabs everything. Scan operation "scans" through the whole table, returning a collection of items and their attributes. The end result of a scan operation can be narrowed down using
To run a Scan operation using CLI, use following command:
Should I use DynamoDB Scans?
Generally speaking, no. Scans are expensive, slow, and against best practices. In order to fetch one item by key, you should use
Get operation, and if you need to fetch a collection of items, you should do that using Query.
When To Use DynamoDB Scan
But sometimes using scans is inevitable, you only need to use them sparingly and with knowledge of the consequences. Here are use-cases by scans might make sense:
- Getting all the items from the table because you want to remove or migrate them
- If your table is really small (< 10 MB)
DynamoDB Scan Examples
After reading the above content, if you feel that the scan query still makes sense for your use-case, then we've got you covered. Here are different methods and scan query code snippets you can copy-paste.
- Create DynamoDB Scans (and Queries) Visually
- DynamoDB Scan in Node.js
- DynamoDB Scan in Python (using Boto3)
Similar to the Query operation, Scan can return up to 1MB of data. If the table contains more records that could be returned by Scan, API returns
LastEvaluatedKey value, which tells the API where the next Scan operation should start. The returned value should be passed as the
ExclusiveStartKey parameter for the subsequent call.
How fast is DynamoDB scan?
DynamoDB Scan is not a fast operation. Because it goes through the whole table to look for the data, it has
O(n) computational complexity. If you need to fetch data fast, use Query or Get operations instead.
What is the DynamoDB scan cost?
DynamoDB Scan cost depends on the amount of data it scans, not the amount of data it returns. Even if you narrow down the results returned by the API using
FilterExpressions, you'll be billed by the amount of data in went through to find the relevant results.
Parallel Scan in DynamoDB
Scans are generally speaking slow. To make that process faster, you can use a feature called "Parallel Scans" which divide the whole DynamoDB Table into Segments. A separate thread/worker then processes each Segment so
N workers can work simultaneously to go through the whole keyspace faster.
Creating Parallel Scan is quite easy. Each of your workers, when issuing a Scan request should include two additional parameters:
Segment- Number of segments to be scanned by a particular worker
Total Segments- Total amount of Segments/Workers/Threads
But, be careful with Parallel scans as they can drain your provisioned read capacity pretty quickly incurring high costs and degrading the performance of your table.
© 2020 Dynobase