Trying to understand the difference between a DynamoDB Partition Key and Sort Key? Confused with all the key terminology and looking for a simple explanation? This is the article for you.
In this article, you’ll learn what a partition key is versus what a sort key is. You’ll also learn some of the additional query flexibility you’ll gain by using a DynamoDB Sort Key.
What is a Partition Key?
When creating a Table and defining your DynamoDB schema, one of the first options you’ll be asked is to specify your Table’s Partition Key and Sort Key. This is an important decision that has impact on how your table’s record’s can be accessed.
DynamoDB table schemas also cannot be changed after your table is created, so its important to get it right the first time.
A DynamoDB Partition Key has to do with DynamoDB’s internal physical storage structure. The partition key is the attribute that DynamoDB will use to partition your data onto one of its many storage nodes.
This is part of the reason DynamoDB is so scalable. Because it can hash your data inputs into an arbitrary number of storage nodes, it can easily scale up to increased demands by simply adding a new partition and shifting data around.
Partition Key Example
For example, say we have a CustomerOrders table and our partition key is CustomerId. Whenever you insert a new record into your database, the value of your record’s CustomerId will be used as an input to a hashing function.
As a general reminder, a hashing function is simply a function that receives an input and produces a mapped output.
In this case, our hash function is our CustomerId value (say CID-123), and our output is a partition that DynamoDB will internally store this record’s data on. Which partition DynamoDB stores your record on is completely transparent to you as a user.
However, the partitioning mechanism DynamoDB employs does have an impact on how data can be accessed in your table in terms of maximum capacity (more of an advanced topic).
Note that if you decide to only specify a partition key and not a sort key, all records must have a unique partition key value. In other words, you will only be able to have one record with CustomerId as CID-123. You’ll see this is not the case if you opt to use sort key in combination with partition key which we’ll explore below.
What is a Sort Key?
A Sort Key is a secondary key that you can optionally decide to use alongside your Partition Key. In other words, your traditional Primary Key can be either just a Partition Key, or a Partition Key + a Sort Key.
A Partition Key and Sort Key combination is known as a composite primary key.
With a Partition Key, you can store records with the same partition key value but a different sort key value. All records with the same partition key value are stored together on the same data storage node.
This allows for some interesting query access patterns such as in the following example.
This can be a bit confusing to visualize, so lets take a look at a practical example of a Partition and Sort Key in action.
Note that a Sort Key is also known as a Range Key. The two terms are generally used interchangeably.
Sort Key Example
Assume we have the same CustomerOrders table and this time we decide to use a CustomerId as the Partition Key and OrderDate as the Sort Key. Our table structure and sample records will look a little something like this.
|CustomerId (Partition Key)
|OrderDate (Sort Key)
Notice how our first three records have the same CustomerId value CID-123 but with different OrderDates. The combination of CustomerId and OrderDate is unique and therefore DynamoDB allows us to store these records.
Sorting, Equality, and More with your Sort Key
By using a Sort key, we’re also able to perform what I call “range-like” queries on our Sort Key values.
For example, we can query DynamoDB for all records with CustomerId CID-123 and OrderDate is greater than 2021-12-31. Since text is lexicographically supported, this operation will return orders starting in 2022 and beyond. The greater than operation is one of many available options. Others include
- = (equal to)
- <= (less than equal to)
- >= (greater than equal to)
- > (greater than)
- begins with
You’re also able to sort your results in either ascending or descending order based on the OrderDate value.
This capability allows you to perform some interesting queries based on your table’s structure.
A Partition Key is simply the key that DynamoDB uses to partition your data onto separate logical data shards. Adding a Sort Key allows us to store multiple records with the same partition key value since the partition key + sort key forms a unique pair, and is therefore our primary key.
Sort Keys allow us to perform “range-like” queries on our Table to enable different access patterns.
If you’re interested in learning more about more capabilities of DynamoDB, check out this article on my Top 5 DynamoDB Tips.
You can learn more on Partition Keys and Sort Keys in the AWS documentation here.
There’s also a great discussion on this topic over at stack overflow.