A Beginner Overview of DynamoDB

Amazon DynamoDB is a hugely popular NoSQL database. It was internally developed at Amazon to address a need for a scalable and highly available key-value store.

Even thought it’s powerful, DynamoDB has an relatively ‘different’ way of doing things in terms of data modelling and defining access patterns. This makes it challenging if you’re coming from a RDBMS or even a competitor to DynamoDB like MongoDB and expecting things to look familiar.

In this article, you’ll learn about the key features of DynamoDB followed by an overview of Dynamo’s core concepts. You’ll learn about availability, pricing, integration with other AWS services, and more. When you’re finished reading this article, you’ll have a solid foundation of what DynamoDB has to offer and the benefits it provides.

So let’s get started.

Key Features

Managed NoSQL Database

DynamoDB is a managed NoSQL database that is optimized for performance at scale. DynamoDB is a key-value data store that achieves high availability by replicating and partitioning your data across multiple storage nodes.

As a managed service, don’t need to worry about maintaining any of the underlying hardware. Dynamo takes care of ensuring the infrastructure is in a safe and healthy state.

DynamoDB is optimized for performance at scale, which is why it is such a popular data store choice for large throughput applications like DoorDash. It scales out horizontally by adding more nodes to the cluster and separating the data onto those nodes. This means that even amount of data in your Table grows, your performance stays consistently low. This is a critical feature for large-scale systems.

High Availability and Durability

DynamoDB is an extremely reliable and efficiently managed NoSQL database service. It has a 99.99% guaranteed uptime, which translates to less than five minutes of yearly downtime or less than 26 seconds of monthly downtime, making it highly available and reliable.

Furthermore, it also has a high degree of durability. Data is stored redundantly across multiple nodes in multiple Availability Zones (AZs). If one of these nodes fail due to a hardware failure or natural disaster, DynamoDB will automatically assign a node in a still healthy AZ to become the primary. The possibility of data loss is a non-zero but mostly negligible concern when using DynamoDB.

Knowing Your Access Patterns

Its important to think about your application’s access pattern before deciding on your table schema. One of the big trade-offs with DynamoDB is that it requires you to know your Partition Key to perform queries on your table. This is different in RDBMS applications like SQL where statements SELECT * FROM TABLE are ubiquitous.

All this means you need to think about how you want to access your data and define a Table structure that facilitates efficient queries (more on this later). Designing a table with a well thought out schema will allow you to take full advantage of DynamoDB’s availability and performance. Not doing so may result in sub-optimal query latencies and higher cost.

If you’re looking for hands-on experience with DynamoDB data modelling, check out Alex Debrie’s The DynamoDB Book. I have a review article on it here.

Access & Security

Accessing DynamoDB programmatically is a straightforward process using the AWS SDK. Depending on the language being used to access your Table, there are different utilities available. For Java, the popular choice is DynamoDB Mapper, and for Python, Boto3 is the most popular option.

Authorization to access your DynamoDB Table is achieved through the AWS Identity and Access Management (IAM) service. IAM allows you to specify who has access to your table and what APIs they have access to. For example, if there is an IAM user named Daniel, I could grant him the dynamodb:query permission policy to query the items on a table. You can go one step further and limit a user’s ability to just a certain set of operations on a specific table.

Additionally, DynamoDB also supports column and record level authorization. This means you can restrict a user’s ability to view columns or certain items in a Table through IAM. This is a powerful data governance ability that can be used to hide sensitive attributes to certain users.

Using IAM eliminates the need to pass around username/passwords as you typically would in a traditional RDBMS setup. Instead, you use your AWS access key’s that are associated with your user to programmaticaly interact with DynamoDB through a SDK.

Integrates well with other AWS services

My favorite feature of DynamoDB is its integration with other AWS services.

Using DynamoDB Streams, for example, you can capture item-level changes on the records that exist in your table and pipe them into a Lambda function to detect change.

You can also integrate it with S3 to do automatic table exports, or move the data into to Kinesis, a data streaming pipeline, to do temporal queries on your data as it passes through the stream.

A third powerful option is integration with Amazon Step Functions. Step Functions are an orchestrator service that let you define workflows that pass through certain steps. Using Step Functions with DynamoDB, you can directly perform any CRUD operation on your DynamoDB table wihtout having to use a middleware compute service like EC2 or Lambda.

This is just a small subset of service integrations that DynamoDB offers. I encourage you to check out this link from the AWS documentation for more details on service integrations.

Cost-effective Usage-Based Payment Model

A core consideration of any application is the cost of the data store. In a typical RDBMS/SQL environment, this would mean paying for provisioned hardware. DynamoDB has a different angle on pricing. Since it is a completely managed AWS service, you don’t get charged for hardware. Instead, the amount you are charged is based on two core principles: 1) the amount of read/write operations you perform on your table, and 2) the amount of data your table stores.

In terms of read/write cost, you are charged for read operations (e.g. GetItem, Query, Scan) versus write operations (e.g. PutItem, UpdateItem, DeleteItem). In other words, the amount you are charged is directly proportional to the number of operations you perform.

Some more context is provided by the AWS DynamoDB documentation:

“DynamoDB charges one write request unit for each write (up to 1 KB) and two write request units for transnational writes. For reads, DynamoDB charges one read request unit for each strongly consistent read (up to 4 KB), two read request units for each transnational read, and one-half read request unit for each eventually consistent read.”

Read Request Units and Write Request Units are commonly referred to as RCUs and WCUs respectively. You’ll see these terms all over the AWS DynamoDB console and as part of the metrics DynamoDB makes available to you through CloudWatch.

Its also important to keep in mind that there are two capacity modes that DynamoDB offers: Provisioned and OnDemand. Provisioned means that you as the user specify the amount of RCUs/WCUs that your table should support at a given time.

For example, I can configure my table to support a minimum of 100 RCUs and WCUs. This is great for applications that have a predictable and consistent workload (i.e. a flat line of access). This mode also supports autoscaling so that DynamoDB can dynamically adjust the amount of capacity made available for your table. In this mode, going over the provisioned capacity isn’t a huge deal – your application will work just fine. However, prolonged periods of reads/writes beyond the provisioned RCU/WCUs can lead to operation throttling.

The other capacity mode (that I tend to prefer) is the On-Demand mode. The On-Demand mode leaves it up to the DynamoDB service to automatically make available more resources for your Table depending on the amount of traffic. This is great for either a) spiky workloads where traffic is variable and unpredictable, or b) people who don’t want to have to worry about managing capacity and want to let DynamoDB worry about it.

Regardless of which option you choose, DynamoDB gives you the flexibility to decide how you would want to be billed, and offers an extremely reasonable price.

Now that you know a bit about what DynamoDB offers, lets briefly discuss DynamoDB’s core concepts.

Core Concepts – Tables, Items, Attributes, Indexes

The image below illustrates the modelling of an “Accounts” table. We’ll be referring to it throughout the remainder of this article.

Tables

Tables, like in a RDBMS, are used to organize collections of data. A table can be configured at a single-region or multi-region level. Multi-region tables are also known as Global Tables. Global tables add additional levels of redundancy, but are most useful in applications that need to share the same data store across the globe.

A table’s creation involves specifying a table name, partition key, a sort key (optional), and the capacity mode. In our example diagram above, our partition key is AccountId and SortKey is CreationDate. The combination of these two values must be unique.

Items & Attributes

Each Table consists of individual items and their corresponding attributes. Attributes are key/value pairs where the value can range from a String, Integer, Boolean, List, Set, Map, and more.

Each item can have one or more attributes. Each row in our example from above is a single unique Item.

Indexes

Indexes allow you to increase the number of ways you can access the data in your Table. Without the use of indexes, we can only access our data using our partition key and sort key. This means that in the case of our example above, we can only retrieve records by the AccountId field. But what if we want to, for example, get all records with the same Country? To achieve this, we need to use a Global Secondary Index (GSI).

A gloobal secondary index is a specified column in your table that you can perform key based lookups on. For example, if we were to specify our Country table as a GSI, we would be able to Query on the Country attribute to find all items with the same country. This really opens up the ways in which we can access data in our table. However, not all good things come for free. GSIs have additional costs and consistency implications that need to be considered. This article by Alex Debrie goes into detail on DynamoDB consistency further.

A second type (but rarely used) type of Index is called a Local Secondary Index (LSI). You can learn more about them here.

Wrap Up

In this article, we’ve talked about the numerous features DynamoDB has to offer and why it’s such a popular database of choice. I hope I’ve given you enough detail

As a next step, check out these other articles below on DynamoDB.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts