Migrating from MongoDB to DynamoDB [Step-by-Step Guide]
Written by Charlie Fish
Published on 2022-04-17
Today let's take a look at migrating from MongoDB to DynamoDB. MongoDB and DynamoDB are both NoSQL databases, and both use JSON-based documents to store data. However, there are a lot of differences between the two that you need to be aware of when planning and executing your migration from MongoDB to DynamoDB.
There are many of different ways you can migrate from MongoDB to DynamoDB. Here is an outline of how I would think about a migration from MongoDB to DynamoDB. This isn't a hard and fast outline, but it should give you a high-level idea of what to think about throughout the migration.
We will go in-depth into these sections as we go through this guide.
- It is important to understand DynamoDB, how it works, and the differences and similarities between DynamoDB and MongoDB.
- This stage is when you start thinking about your specific application and plan how you will handle the differences between the two database platforms. This should include a deep dive into your current database infrastructure and usage patterns.
- It's time to start coding your migration. This will include calling the DynamoDB APIs and migrating your database code to use DynamoDB.
- Somehow you have to migrate your existing data from MongoDB to DynamoDB.
- Things do go wrong. It is important to pause and monitor your application to ensure everything is working as expected.
- There will likely be cleanup work to do. This can include things like removing old MongoDB code from your codebase, decommissioning MongoDB servers, etc.
There are a couple of important differences between the server infrastructure for MongoDB and DynamoDB.
In MongoDB, you can install the database on any platform, and it is a self-managed system. You are responsible for scaling servers, managing maintenance, etc. Although there are some managed MongoDB options (such as MongoDB Atlas), these options are far less common for most MongoDB users.
Whereas DynamoDB is a fully managed database platform provided by AWS. This means you don't have to worry about servers, maintenance, etc. You can simply create a table, set the capacity, and start reading/writing data. You can take this a step further with the DynamoDB autoscaling capacity features. All maintenance, updates, etc., are all handled by AWS, so you don't have to worry about that aspect of it.
It is important to note that you can install a DynamoDB Local instance for use with CI and testing. However, it isn't really possible to scale this, and it is NOT recommended for production use. However, the fact that it's a fully managed service should not discourage you from using it since you can easily install DynamoDB local for testing & development purposes.
Luckily there are a lot of similarities between items in MongoDB and DynamoDB. The foundation of both objects are basically the same as they both use key-value pairs structured in a JSON object. Both also are schema-less, meaning you can have any type of data in your database without having to modify your schema. This can be an added benefitp however sometimes, it can be useful to have a schema to ensure that you are not storing data that you don't want. We will discuss this more in our Dynamoose section below tho.
DynamoDB supports many common data types such as strings, numbers, booleans, arrays, objects, and binary.
Although MongoDB supports more complex data types such as geospatial data, dates, etc. DynamoDB does not support these data types, at least not natively. It is very plausible to reproduce these type of data types tho. For example, dates can easily be stored as a Unix timestamp number value, and geospatial data can be stored as an object or Geohash depending on the use case.
Each item in DynamoDB will also have a primary key and (optionally) a sort key. If you have a sort key, the combination of the primary key and the sort key will be unique. If you don't have a sort key, the primary key will be unique.
Querying is probably one of the biggest differences between MongoDB and DynamoDB. We will look at each of the different ways to query data in DynamoDB so you can understand which method works best for each one of your use cases.
Before we get started on this section, it is highly recommended that you write down every access pattern for your data. What are the inputs and expected outputs for each access pattern? An example of one of these should be something like: "Get a user with the email address of _". Writing each of these access patterns down will help you visualize and understand the best option for using DynamoDB to read your data.
The first method to consider retrieving data is the
get command. This command is used to retrieve a single item from your database. You will provide a primary key (and a sort key if you have one), and DynamoDB will return the item that matches this key. This command is incredibly efficient and scales extremely well with DynamoDB's architecture.
The next method is the
batchGet command. This command is just like
get, except you can pass in an array of keys, and you get back an array of objects. The efficiency here is that you don't have to make multiple subsequent calls to DynamoDB to retrieve all data.
Next is the
query command. With the query command, you can query your indexes and search for all items where your index hash key = _. Then also filter that efficiently using the range key, by doing search operators on that. Finally, you can do additional filters that, as discussed below, don't increase the efficiency of your query but do save you bandwidth. This is by far the most efficient way to search through your DynamoDB data. Although it seems limited on the surface with its functionality, if you are able to change the way you think about modeling your data, this is an extremely powerful way to search your data. When people praise DynamoDB's scalability, this is truly what they are referring to. DynamoDB makes query commands incredibly efficient.
The final method to retrieve data is
scan. This command is used to retrieve all of the items in your database. This command is very inefficient as it only accepts filters, thereby not adding any efficiency to your retrieval.
As you can see, there is no native way to handle joins in DynamoDB. This is in contrast to MongoDB, where you can use the
$lookup operator to perform joins, which are handled on the database layer itself. Although it is possible to make multiple retrieval commands that act as joins, this increases the latency of your application. It is recommended that you think about how to model your data in an efficient way to run a query that efficiently returns the data necessary for your use cases.
Filters in DynamoDB can really bite you if you aren't careful. It's important to think that filters are primarily saving you bandwidth. Filters occur by manually searching every item and filtering them out. There are no database optimizations or indexes that make this more efficient. Think about it as an application side filter function. The only thing you are saving at that point is bandwidth, not performance improvements.
Now, the big question: how to actually perform the migration? There are a few ways to do this. We will discuss two possible solutions below, but there are many other ways to do this.
First, you can of course schedule maintenance, migrate your data, then deploy your code to switch to DynamoDB. Of course, with this method, you want to make sure no writes are being sent to your MongoDB while you are migrating, otherwise that data could be lost. This method can be good for small applications with a limited amount of data. However, it does mean downtime for your application.
The other common migration method is to start writing your data to both MongoDB and DynamoDB. Once that stage is deployed and working, you can migrate old MongoDB to DynamoDB. At this stage, your data should be identical and consistent between your MongoDB and DynamoDB systems. Once you have verified that, you can start pointing your reads/queries at your DynamoDB database. Once this is all complete, you can then delete your MongoDB database. This has the advantage of not having to schedule downtime, but it is a much more complicated migration process as it requires multiple stages.
Of course, no matter what method you choose, you should thoroughly test your migration process and DynamoDB code before starting the migration in any production environment. It is highly recommended to test everything in a sample test environment before doing the migration on production.
Mongoose is a very popular tool when using MongoDB in Node.js. It allows for strict schema validation along with a higher-level API that makes it much easier to interact with your database.
Dynamoose is a great alternative to Mongoose for DynamoDB. It offers a very similar API/interface to Mongoose, which helps make migration incredibly easy.
Additionally, Dynamoose provides schema validation features that make it possible to provide a more strict schema on top of DynamoDB.
It is important to note that while Dynamoose is not a drop-in replacement for Mongoose, it was heavily inspired by Mongoose and has a very similar API. If you are already using Mongoose in your application Dynamoose is a library you should definitely consider using.
You can read more about Dynamoose on the projects website.
DynamoDB Starting Tips/Tricks
It is important that although migrating from MongoDB to DynamoDB is a lot easier of a process than it is when migrating from an SQL-based database, it is still a completely different database platform. Therefore there are a few tips and tricks you should consider during your migration process.
The first great resource is Dynobase's extensive collection of DynamoDB tutorials. You can find a full list of these tutorials here.
Next, as is true with any database system, it is highly recommended you think through your access patterns. What are all the different ways you want to access/query your data? This helps you define what indexes you want to create and how you want to structure your data.
Finally, test and try things. Write down every complexity in your application, build test applications that call DynamoDB APIs, and get comfortable with the database. You will find that some things are far easier to achieve with DynamoDB than with MongoDB, and you will also find that some things that were easy to achieve with MongoDB are difficult or impossible to achieve with DynamoDB without rethinking your data model.
Migrating to a new database is never a plug-and-play process. It requires a lot of planning and thinking about how to best handle that migration. However, MongoDB and DynamoDB have a lot of similarities that make migration more straightforward.
Good luck with your migration!
© 2022 Dynobase