Notes on the new DocumentDB partitioning and pricing

The new pricing scheme and partitioning functionality for DocumentDB, made available at Build, was a significant improvement to the DocumentDB offering.

I learned about this a few weeks beforehand, but I did not fully see the benefits until I started looking at how this would effect our current solution.

Recap

With the old pricing model, one could choose between three performance levels per collection, from 250 to 2500 Request units. Each collection could store a maximum of 10GB of data. This constraint is part of what drives the scalability of DocumentDB. One was forced to prepare the data for sharding. Combined with the predictable request charge of queries it was possible to calculate the required collection count, size and cost. However, should the need for re-sharding occur, it would be a manual task and it would also require a strategy for handling reads and writes during the re-partitioning of the data. Not to mention that this operation would require RUs as well, often on a system under stress.

With this pricing scheme my recommendation was to aim for S1 collections at 1K RU for normal operations, scaling down to S0 during night/low traffic periods and leaving the S2 for high traffic usage or re-sharding.

Enter Partitioned collections

So, instead of letting you deal with this, the DocumentDB created Partitioned collections. In reality it's just a bunch of collections and instead of paying per collection you just pay for the size used and request units needed. You no longer need to think about re-sharding and leaving RU's available for this operation.

Partitioned collections removes a lof of Yak hair for Operations

So, what do you need to think of with Partitioned Collections? Well, each partition still has some limits, it's still 10GB and 10K RU and the partition also acts as the transactional and query boundary. What does this mean?

  • You want enough partition keys with a meaningfull distribution
  • Make sure your data is evenly distributed over your keys.
  • Make sure to query with partition key

With just a few keys you will not write to all collections. With a bad spread of the keys, you can risk having a hot partition. Also, if a significant amount of your data is related to one partition key you will have a problem

The Request Charge for a query will drastically increase if you do multi-partition querys as the query is parsed and executed per. collection. One can experience as much as a 25x increase in RU cost for unbounded queries.

But, in overall this change in pricing and functionality is great! These are problems you would have had with the old as well. It's just that you would have to deal with a lot more yak shaving as well.