Worlds second best Harald

State versioning in Orleans

Harald Schult Ulriksen — Thu, 06 Feb 2020 07:48:00 GMT

The documentation on Grain versioning in Microsoft Orleans starts off with this

"👉WARNING: This page describes how to use grain interface versioning. The versioning of Grain state is out of scope."

With a stateful system handling multiple state versions is a must if we want to deploy new versions without taking the whole system offline. Working with F# we have some powertools in our belt, and for dealing with multiple versions we decided to try out Discriminated Unions, and they worked our great. One can of course apply the same technique in C# with a bit more checks.

tl;dr;

The short version (phun intented) is:
Discriminated union with one entry per version and a module which is capable of upgrading old versions as they are loaded from storage. Writing state always write the latest version.

Example

Let's say we have a state where we wish to go from a single image to an array if images, just changing the field type would not work. Lets call this ImageState

The goal is for our grain implementation not to worry too much about versions, to do this we separate the in memory representation, state, in our case called ImageState from the DTO representation called ImageStateDto.

The in memory representation which the grain works with ImageState has a simple list of images uri's

type ImageState = { images: Uri list }

Our stored representation of this can be both the new version and old version

type ImageStateDtoV1 = { image: Uri option }
type ImageStateDtoV2 = { images: Uri list}
type ImageVersionedStateDto = 
    | V1 of ImageStateDtoV1
    | V2 of ImageStateDtoV2

Update:
Be sure to start with at least two cases and test with your binary serializer. With Orleans default binary serializer, single case discriminated unions will yield different format, hence fail deserialization

Since Orlans must be able to create new, empty versions of the state object we'll wrap this with a class and also add a module which can convert from Dto to State and from State to Dto.

type ImageStateDto(state: ImageVersionedStateDto) =
    member val State = state with get, set
    new () = ImageStateDto({ images = []} |> V2)

module ImageStateDto =
    let fromDto (dto:ImageStateDto)  =
        match dto.State with
        | V1 dto -> 
            match dto.image with
            | Some i -> { ImageState.images = [i]}
            | None _ -> { ImageState.images = []}
        | V2 dto -> 
            { ImageState.images = dto.images}
    let toDto (state:ImageState) : ImageStateDto =
        { ImageStateDtoV2.images = state.images} |> V2 |> ImageStateDto

The ImageStateDto is the class which will be injected into the grain constructor as IPersistentState . Inside the grain we wrap the IPersistentState to read the state using the fromDto method. Instead of calling WriteStateAsync directly we have our own internal method using ImageStateDto.toDto before calling WriteStateAsync. A sample grain may look like this

type ProfileImage =
    inherit Grain
    
    val private persistence : IPersistentState 

    new (persistence) = {
        persistence = persistence
    }
    member this.State 
        with get() = this.persistence.State |> ImageStateDto.fromDto
    
    member private this.WriteStateAsync(state)  =
        this.persistence.State <- state |> ImageStateDto.toDto
        this.persistence.WriteStateAsync()

It may seem a bit tedious, but we've found this to work very well for our needs.

Deploying Orleans to Kubernetes

Harald Schult Ulriksen — Sat, 01 Feb 2020 19:12:49 GMT

We've been running Microsoft Orleans in Kubernetes for nearly 2 years, and for the last half year or so we've had full continuous delivery whenever we commit to master branch in GitHub.

On GitHub there is an issue to gather details for Orleans production usage, and recently k8s popped up in the chat for the latest virtual meetup. A safe and sound deployment regime is the basis for running Orleans in production, so documenting how we do this will be my first contribution. How we do our deploys will touch base on multiple subjects, such as state versioning, testing, feature toggles and metrics. In this blog post I will give an introduction to our application , continuous delivery pipeline and how we deploy to Azure Kubernetes Service (AKS) without downtime.

Deployables

Our application consists of 3 deployables, an API running together with Orleans (the direct client is quite a bit faster), a service processing write messages from the API and another service processing messages from our internal systems. From deployment perspective and important detail is the use of Azure Service Bus

Deployables

Users perform both read and write operations using the API, all writes goes through Azure Service Bus before being picked up and persisted (we're considering writing directly by default and only use Service Bus in case of write failuers).

Update:
We do not use the Orleans.Clustering.Kubernetes provider, we use AzureStorageClustering. The clustering provider is just for reading and writing cluster information. It is not related to the actual hosting model.z

Deployment pipeline

The deploy pipeline is rather straing forward, and fully automated from the point a pull request is merged to master. The deployment will go through two environemnts before our production environment. The first environment we deploy to, stage, has data we can modify to provoke failures. The next environment is our pre-production where data is replicated from our production environment at regular intervals. Finally we have our production environment. We currently run in one data center only as the service provides added value, not primary functionality.

CI/CD Pipeline with Github, TeamCity, Octopus Deploy, Azure Container Service and Azure Kubernetes Service

When something is comitted to master, TC will trigger a build using our docker build container
TC will run unit and integration tests
TC will upload the 3 images to Azure Container Registry
TC will upload a support package to octopus which contains our deploymen script, config file templates ++
Octopus will create deployment started annotations in Application Insights
Octopus deploy will execute the deployment script against Kubernetes
After deployment Octopus wil create annotations in Application Insights and notify on slack

The deployment

Orleans supports blue green deployments, with multiple versions of the same grain type, given the interfaces is versioned as well as versioning state if need. See the Orleans documentation here. I've written about our state versioning technique in a separate blog post.

When deploying Orleans to Kubernetes I would either go for a single StatefulSet or multiple deployments. We chose the latter as we also version our Service Bus topics. Note that both methods will require that

When we start rolling our a deployment we place an annotation in Applications Insights, making it easy to correlate any anomalies with the deployments. The deploymen also contains a link to our internal release note which has a list of all pull requests since the previous version.

Annotations in application insights, blue is start of deployment and green is posted on success. Red if it fails

Before the deployment we have a service and the apps in V1, and an ingress which we'll reuse.

Before the deployment

During deloy we add the new version and a new temporary ingress for smoke tests. The tests verify that our api respons as expected. Note the dashed line in the diagram below, once the new silos are in the cluster they will start recieving traffic. Also, we have some stateless grains, which read their data from a stateful grain, chances are our tests will hit stateless grains in the new slio which will load its data from v1 grains, not v2. If we were to verify this as well we would need to shutdown both versions, and fire up v2 alone. This is not something we can do in production, but it is possible in other environments.

During deploy

When we're satisfied with our tests we'll delete the smoke test ingress, switch the old ingress from v1 to v2. We'll then start shutting down the old silos by reducing the replica count 1 by 1, waiting between each one until it is unloaded before shutting down the next one.

We finish off the deployment by placing a deployment completed marker in Application insights

After deploy

Failures

On the off chance our smoke tests fails we start taking down the new silos one by one. This goes without any serious problems in production as we have already rolled out through 2 other environments (we've had to wipe staging a few times)

Do you ever have to do a full shutdown during deploy?

Yes, we have one grain where we did not bother versioning the state. It means that whenever we change this grain we have to stop all silos and spin up the new ones, resulting in downtime. This is automated in our deploy script, and is based on a label on the pull requests in github. We should propably start versioning that state as well to get out of it. The other case we have seen is when we have incompatibilities in the request context, causing everything to blow up.

What could be better?

Both the warmup and shutdown of silos perhaps. It would be nice to have a silo join the cluster, and only have specific grains/calls to it. Just to keep it free from production traffic on it before our smoke tests are done. Same goes for shutdown, when our tests are fine I would very much like the existing silos to "empty" themself of grains before shutting down and not recieving any new activations. It probably works like this already, but we do see a handful of exceptions during shutdown and it would be nice to be able to do shutdowns without any exception entries in our logs. I have thought about writing a custom placement director to do this, but so far it is just a thought.

Understanding grain references and UniqueKey for Orleans CosmosDB provider

Harald Schult Ulriksen — Thu, 21 Feb 2019 12:10:00 GMT

As a part of enabling custom partition keys on the cosmosDb provider I wanted to get a better understanding of the id used in the documents stored in CosmosDB, so I decided I could just as well do a write up of it as a part of the learning experience.

Here's an example of an id

b780de47-a7d2-44c5-8a8f-7eea29e9561f__GrainReference=4fb7889c000475dcf1a50290320ead970600000000000000+13f58135-de2b-4ec7-8f5b-ae609c6d8cbb

After a bit of digging it is not as bad as it looks, the pattern is {ClusterServiceId}__{GrainReference.KeyString}. The GrainReference.KeyString has four different formats

Observer reference = GrainReference={GrainId} ObserverId={ObserverId}
System target = GrainReference={GrainId} SystemTarget={Silo Address}
Generic argument = GrainReference={GrainId} GenericArguments={genericArguments}
Grain reference = GrainReference={GrainId}

In our case we're looking at grain references. The GrainId is a string created by calling GrainId.GrainToParsableString which in turn calls on UniqueKey.ToHexString()

So this seems straight forward, right. Well just almost. The id 4fb7889c000475dcf1a50290320ead970600000000000000+13f58135-de2b-4ec7-8f5b-ae609c6d8cbb is from a grain created calling GetGrain(new Guid("000475dc-889c-4fb7-97ad-0e329002a5f1"), "13f58135-de2b-4ec7-8f5b-ae609c6d8cbb");

Last part first, when using composite grain keys the last part is preserved as is, after a + sign. The first part is a bit more of a puzzle. To get to this we have to look at the implementation in UniqueKey. Provided we're after a grain reference the call chain from GrainReference is GrainReference.ToKeyString -> GrainId.ToParsableString -> UniqueKey.ToHexString. The string before the + sign is built with the following format s.AppendFormat("{0:x16}{1:x16}{2:x16}", N0, N1, TypeCodeData); (x16 formats to hexadecimal string of 16 digits.).

Starting from the end again, typecode is used to identify the Grain implementation with any generic parameters (it can be overridden by TypeCodeOverrideAttribute). The typecode is a SHA256 hash based on the full name of the class with generic arguments.

All we're left with is the first two longs, which for our case is easy enough. When the UniqueKey is initialized the Guid it is converted to 2 int64's

var n0 = BitConverter.ToUInt64(guidBytes, 0);
var n1 = BitConverter.ToUInt64(guidBytes, 8);

And that's it.

There's some discussions on this when searching Gitter, and there's some proposals to change how this works https://github.com/dotnet/orleans/issues/1123 and https://github.com/dotnet/orleans/issues/3049 and https://github.com/dotnet/orleans/issues/1121#issuecomment-303951424.

Easy scaling with Cosmos DB for large loads

Harald Schult Ulriksen — Tue, 04 Jul 2017 18:41:15 GMT

Working with a customer we need to load a batch of 40-50K documents to Cosmos DB. We could let this pass, but it will result in quite a few HTTP 429 RequestRateTooLarge responses. The operation will be retried, but it will also hurt other clients accessing the same data. Potentially causing a cascade of 429's and possibly degraded performance for other clients, with real humans waiting.

To prevent this we can scale up to the required request units. This is easily done through code

offer = _documentClient.CreateOfferQuery().Where(o => o.ResourceLink == collection.SelfLink).AsEnumerable().Single();

await _documentClient.ReplaceOfferAsync(new OfferV2(offer,   HighThroughputRequestUnits));

After processing our batch we can scale down again using the same technique.

Provided we have a IBatchJob interface, we can then use the decorator pattern to wrap our batch job with a HighThroughputBatchJobDecorator, which will scale up Cosmos DB, execute the original BatchJob and scale it down again.

public class HighThroughputBatchJobDecorator : IBatchJob
{
	private const int HighThroughputRequestUnits = 10000;

	private IBatchJob _batchJob;
	private IDocumentClient _documentClient;
    private IDocumentDBConfig _config;

    public HighThroughputBatchJobDecorator(IBatchJob batchJob, IDocumentClient documentClient, IDocumentDBConfig config)
    {
        _batchJob = batchJob;
        _documentClient = documentClient;
        _config = config;
    }
    public async Task RunAsync(Uri feedUri, XElement feedXml)
    {
        Offer offer = null;

        try
        {
            DocumentCollection collection = _documentClient.CreateDocumentCollectionQuery(UriFactory.CreateDatabaseUri(_config.DatabaseName))
            .Where(c => c.Id == _config.CollectionId).ToArray().Single();

            offer = _documentClient.CreateOfferQuery().Where(o => o.ResourceLink == collection.SelfLink).AsEnumerable().Single();
            await _documentClient.ReplaceOfferAsync(new OfferV2(offer, HighThroughputRequestUnits));

            Trace.WriteLine("Running with high througput");
    
            await _batchJob.RunAsync(feedUri, feedXml);
        }
        finally    
        {
            if (offer != null)
            {
            	// Scale down
                await _documentClient.ReplaceOfferAsync(offer);

                Trace.WriteLine("Collection scaled down from high throughput");
            }
        }
    }
}

Pretty neat ;)

Default indexing in Cosmos DB

Harald Schult Ulriksen — Thu, 22 Jun 2017 10:53:48 GMT

I recently needed some paging functionality in Cosmos DB, since skip and take is not directly supported we're using a cursor based paging based on a date time. While testing my repository I suddenly did not get the results I expected. After a little bit of trial and error with the Query Explorer in the portal I noticed somethign important

Creating collections with the SDK will not use the same index configuration as collections created through the portal.

I had added code to create the collection automatically, and after testing this my collection used the "old" index settings.

I have blogged about this earlier, http://blog.ulriksen.net/indexing-changes-in-documentdb/ but totally forgotten about it since then.

So, how do you create a new collection with default indexing using the .NET SDK?

I did a shout out to the Azure Cosmos DB team, and they confirmed this will change

And yes, the plan is to change the SDK defaults so that they're the same.
— Azure Cosmos DB (@AzureCosmosDB) June 21, 2017

Running group select

Harald Schult Ulriksen — Thu, 08 Sep 2016 18:48:00 GMT

We're currently working on creating a somewhat large XML file where we need to read records from our database, perform some calculations on a small subset of the records based on a key, and then write everything to a blob.

Services such as Azure functions, which are priced by memory consumption, makes .ToList() more than a triviality

As it is now we have excluded a lot of content and we can safely read the whole set into memory, but we know that our dataset will grow, and I someone decides to include our archives, we will run out of memory.

Enumerators to the rescue

As we only need a small subset of the data in memory at the same time we can work with an enumerator and then read in batches, perform our operations and then write the data to disk. Fairly easy, however we needed to do this more than one place and I figured I could just as well write it as a general Linq-ish function.

NOTE: This does only make sense when you do not want to load the whole dataset into memory and the data is sorted by the key selector

IEnumerable RunningGroupSelect(this IEnumerable source, Func keySelector, Func, IEnumerable> resultSelector)

The method signature takes a regular TKey. When enumerating over the source the result will be buffered until the result of the KeySelector changes. When the KeySelector changes, the key and the collected buffer will be passed to the result selector function. The calling code will then enumerate over the result without noticing anything to the buffering.

A use case is calculating availability for a season based on the rights for each episode.


 public static IEnumerable RunningGroupSelect(this IEnumerable source, Func keySelector, Func, IEnumerable> resultSelector)
        {
            return new GroupedResultIterator(source.GetEnumerator(), keySelector, resultSelector);
        }
    }
    
    internal sealed class GroupedResultIterator : IEnumerable, IEnumerator
    {
        private enum State
        {
            New,
            Open,
            SourceEOF,
            Disposed
        }
        private State currentState;
        private readonly IEnumerator _source;
        private readonly Func _keySelector;
        private readonly Func, IEnumerable> _resultSelector;
        private TResult[] groupResult;

        private int position;

        public GroupedResultIterator(IEnumerator source, Func keySelector, Func, IEnumerable> resultSelector)
        {
            if (source == null)
                throw new ArgumentNullException(nameof(source));

            if (keySelector == null)
                throw new ArgumentNullException(nameof(keySelector));

            if (resultSelector == null)
                throw new ArgumentNullException(nameof(resultSelector));

            currentState = State.New;

            position = 0;

            _source = source;
            _keySelector = keySelector;
            _resultSelector = resultSelector;
        }
        
        public TResult Current
        {
            get
            {
                return groupResult[position];
            }
        }

        object IEnumerator.Current
        {
            get
            {
                return Current;
            }
        }
        public IEnumerator GetEnumerator()
        {
            return this;
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return this;
        }

        public void Dispose()
        {          
            groupResult = null;
            currentState = State.Disposed;
        }

        public bool MoveNext()
        {
            if (currentState == State.Disposed)
                return false;

            if (currentState == State.New)
            {
                currentState = State.Open;
                if (!_source.MoveNext())
                    currentState = State.SourceEOF;
            }

            if (groupResult == null || (position == groupResult.Length - 1 && currentState != State.SourceEOF))
            {
                var keyResult = new TResult[0];

                while (keyResult.Length == 0 && currentState != State.SourceEOF)
                    keyResult = GetResultForKey(_keySelector(_source.Current));

                groupResult = keyResult;
                position = -1;
            }

            if (currentState == State.SourceEOF && groupResult.Length == 0 || position == groupResult.Length - 1)
                return false;

            position++;
            return true;
        }

        public void Reset()
        {
            throw new NotImplementedException();
        }

        private TResult[] GetResultForKey(TKey key)
        {
            var buffer = new List();

            if (key == null)
            {
                buffer.Add(_source.Current);
                if (!_source.MoveNext())
                {
                    currentState = State.SourceEOF;
                }
            }
            else
                while (key.Equals(_keySelector(_source.Current)))
                {
                    buffer.Add(_source.Current);
                    if (!_source.MoveNext())
                    {
                        currentState = State.SourceEOF;
                        break;
                    }
                }
            return _resultSelector(key, buffer).ToArray();
        }
    }

I have a hunch that this can just as well be solved with TPL Dataflow or Rx, but for now it seems sufficient.

Notes on the new DocumentDB partitioning and pricing

Harald Schult Ulriksen — Tue, 24 May 2016 18:09:54 GMT

The new pricing scheme and partitioning functionality for DocumentDB, made available at Build, was a significant improvement to the DocumentDB offering.

I learned about this a few weeks beforehand, but I did not fully see the benefits until I started looking at how this would effect our current solution.

Recap

With the old pricing model, one could choose between three performance levels per collection, from 250 to 2500 Request units. Each collection could store a maximum of 10GB of data. This constraint is part of what drives the scalability of DocumentDB. One was forced to prepare the data for sharding. Combined with the predictable request charge of queries it was possible to calculate the required collection count, size and cost. However, should the need for re-sharding occur, it would be a manual task and it would also require a strategy for handling reads and writes during the re-partitioning of the data. Not to mention that this operation would require RUs as well, often on a system under stress.

With this pricing scheme my recommendation was to aim for S1 collections at 1K RU for normal operations, scaling down to S0 during night/low traffic periods and leaving the S2 for high traffic usage or re-sharding.

Enter Partitioned collections

So, instead of letting you deal with this, the DocumentDB created Partitioned collections. In reality it's just a bunch of collections and instead of paying per collection you just pay for the size used and request units needed. You no longer need to think about re-sharding and leaving RU's available for this operation.

Partitioned collections removes a lof of Yak hair for Operations

So, what do you need to think of with Partitioned Collections? Well, each partition still has some limits, it's still 10GB and 10K RU and the partition also acts as the transactional and query boundary. What does this mean?

You want enough partition keys with a meaningfull distribution
Make sure your data is evenly distributed over your keys.
Make sure to query with partition key

With just a few keys you will not write to all collections. With a bad spread of the keys, you can risk having a hot partition. Also, if a significant amount of your data is related to one partition key you will have a problem

The Request Charge for a query will drastically increase if you do multi-partition querys as the query is parsed and executed per. collection. One can experience as much as a 25x increase in RU cost for unbounded queries.

But, in overall this change in pricing and functionality is great! These are problems you would have had with the old as well. It's just that you would have to deal with a lot more yak shaving as well.

Indexing changes in DocumentDB

Harald Schult Ulriksen — Fri, 29 Apr 2016 07:10:01 GMT

TL;DR;
Hash indexing is dead, long live Range indexing

I recently did a DocumentBD talk for NNUG in Oslo, and made a discovery during my preparation of a demo on indexing.

The default index configuration when creating a collection in the portal differs from creating one with the API and also the documentation. Both the API and Documentation follows the "old" indexing settings[1] but this will show up as Custom in the portal.

Creating a collection using the portal, the new default is range for both string and numbers. It will also display as default settings in the portal.

Another observation is the new default indexing for the point datatype, points will now by default be indexed for spatial searchs.

I've reached out to the DocumentDB team, and apperantly they are changing the indexing a bit. As far as I understand they have optimized the storage for range, so that the difference between hash and range are very small. This may lead to the deprecation of hash indexing. This is a good thing, a simple product means less developer surprises and frustration.

In my dream world, this is a step on the road to skip and take functionality. The DocumentDB closely monitors our wishlist, so please take your time to upvote skip/take here.

[1] Code for old/API indexing settings,

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        },
        {
          "kind": "Hash",
          "dataType": "String",
          "precision": 3
        }
      ]
    }
  ],
  "excludedPaths": []
}

[2]New indexing settings

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        },
        {
          "kind": "Range",
          "dataType": "String",
          "precision": -1
        },
        {
          "kind": "Spatial",
          "dataType": "Point"
        }
      ]
    }
  ],
  "excludedPaths": []
}

Benefits of having an open source project

Harald Schult Ulriksen — Fri, 01 Apr 2016 17:18:16 GMT

During Desember last year I created an open source project called DoubleCache. It's a cleaner version of a cache pattern we use at work, written from scratch.

TL;DR

Great source for a conference talk
Helped me learn FAKE and Appveyor
Made me reason about handling pull requests, figuring out a how I prefer to do this (blogpost coming).
Showcase skills and thoughts for potential customers and employers.

In details

The first benefit my open source project gave me was backing up a talk I had planned for the Norwegian .Net user group. The talk was about Redis and our experiences of moving to a centralized cache. Backing the talk with my own implementation provided very clear samples.

Truth be told, DoubleCache wasn't yet public when I did my talk, but it became public a couple of days later. Pushing code out in the open forced me to setup a proper build and test setup, something I might not have done had it been a private project. With an actual need to learn FAKE it was easy to get the Instant Gratification Monkey out of the way[1]. I prefer xUnit and shoudly when writing C#, and this meant I had to dig into FAKE to a degree where I now feel comfortable with it.

It did not take long time before I had the first pull request, this one from a colleague but never the less I learned how to use AppVeyor. Well, not much to learn really when using FAKE. Having all your build configurations under version control is just great.

After yet another talk, I got the first pull request from someone I did not know, Clayton Sayer. He's done some great things, fixed a few configuration issues, added delete functionality and synchronous support. After looking at our repository at work and reading a few blogposts, I was sure that I wanted single commits when merging pull requests. I needed a way to do this, and at the same time be able to make modifications such as bumping the version number in the build script. More on this in the next blog post.

Finally, putting code out in the open forced me to think things through more than once. Not just because it's public, but it will serve as part of my resume. It will without doubt be a discussion point when talking to potential customers or future employers.

It has been, and continue to be, a great experience.

Unit testing Redis Lua scripts

Harald Schult Ulriksen — Thu, 17 Mar 2016 18:32:44 GMT

With great power comes great responsibility. The power of Lua scripting with Redis and what you can achieve using data structures beyond a key/value store is amazing. Starting out is quick and easy, and with some VIM tricks you can easily create an efficient feedback loop.

But then what, after a few iterations, how do you know that your scripts are still working?

Well, there's no reason not to include your Lua scripts in a test regime. To do this we can "simply" fire up a Redis server as a part of our tests. Now, you could argue that these are integration tests as we're using a service running out of process and perhaps that's right. But these tests run so fast, and are so specific, that I prefer to run them as a part of my development cycle. They're also executed alongside our regular unit tests, not our integration tests.

TL;DR

How does it work? We reference a redis server, start the server, execute our tests, and shutdown.

Step by step - creating our test project

Here's our system under test, a small and rather useless script for adding a key value to the cache. ``` public class SomeServiceUsingLuaScript { private IConnectionMultiplexer _redisConnection; public SomeCacheService(IConnectionMultiplexer redisConnection) { _redisConnection = redisConnection; } public void RunsALuaScript(string value) { _redisConnection.GetDatabase().ScriptEvaluate(string.Format("redis.call('set','test','{0}')", value)); } } ```
Create a new class library for your tests

Use nuget and add references to Redis-64, StackExchange.Redis, xunit.core, xunit.runner.visualstudio and Shouldly ``` Install-Package Redis-64 Install-Package StackExchange.Redis Install-Package xunit.core Install-Package xunit.runner.visualstudio Install-Package Shouldly ```
Create a folder called Redis4. As linked files, add redis-server.exe and redis.windows.conf from .\packages\redis-64.3.0.501\tools\ and set the Copy to output property to "Copy if newer"

To make sure we start the server once for all our tests we'll use the class and collection Fixture from xunit. The class fixture will be responsible for starting the redis server, and the collection fixture will allow us to run multiple test classes with the same instance of the class fixture. For small and closely related classes I prefer to keep them in the same file. Create a new file called RedisFixture.cs with the following content ``` [CollectionDefinition("RedisCollection")] public class RedisCollectionFixture : ICollectionFixture
{ //// Marker class for Collection Fixture }
public class RedisFixture : IDisposable
{
private static Process redisProcess;

public RedisFixture()
{
if (redisProcess == null || redisProcess.HasExited)
redisProcess = StartRedis();
}

public void Dispose()
{
try
{
if (redisProcess != null)
{
if (redisProcess.HasExited)
return;
```
    redisProcess.StandardInput.WriteLine("SHUTDOWN");
    Thread.Sleep(500);

    if (!redisProcess.HasExited)
      redisProcess.Kill();
    redisProcess.Dispose();
  }
}
catch
{

}
```
}

We can now write our first test using Redis
[Collection("RedisCollection")]
public class MyLuaScriptTests
{
[Fact]
public void RunALuaScriptTest()
{
var connection = ConnectionMultiplexer.Connect("localhost");
var service = new SomeCacheService(connection);
```
  service.RunsALuaScript("a");

  var result = (string)connection.GetDatabase().StringGet("test");
  result.ShouldBe("a");
}
```
}




The test is now available in the Visual studio test explorer (it may fail running if you're using the 2.1.0 version of the xunit.runner.visualstudio. In that case, try to downgrade it to 2.0.1).  During the first test exection you may be prompted to adjust firewall settings, as Redis binds to a tcp port.


You're now ready to write Lua scripts for Redis in a safe and sound manner.

Turn up the beat, heart rate of a speaker

Harald Schult Ulriksen — Wed, 10 Feb 2016 19:33:52 GMT

Beginning of January I was contacted by the Bitshift meetup organizers, asking if I could come over and do a talk or two. Having two talks more or less prepared I decided that it could be a fun experience and also great practice. I gave a talk on caching (slides), and one on the architecture behind personalization of http://tv.nrk.no, (slides).

Before and during my previous talks I have noticed my heart rate goes somewhat haywire, so I decided it could be fun to track it during a talk. As I got on stage setting up and having a chat with one of the organizers I turned on my garmin and noticed straight away that I was having a small workout, that's a 140bpm. Apparently something happened about 5 minutes in, my pulse peaked out at 165 bmp. Here's the whole story
We had a break about an hour in before we started again, easy to spot.

I have a tendency to talk a bit fast, and also run demos waaay too fast. Heart rate may be a part of it, and I will keep tracking it to see if I can managed to calm down a bit. The first chance seems to be April 26th, when I will give a talk on DocumentDb for NNUG Oslo

Bitshift was a great experience, and it was really fun talking about something I care about and engage me. I was also surprised by the strong turnout, about the same as we see in Oslo.

Thanks for having me.

Picture by http://twitter.com/olavamjelde

Isolating failures - deliver what you can

Harald Schult Ulriksen — Tue, 12 Jan 2016 19:48:40 GMT

If some of your functionality fails, should it take your website down?

Let's say you have a frontpage of a webshop displaying promoted items, a campaign and a list of the most popular items. If you fail fetching the popular items, or have some invalid data in it which throws an exception, having the whole frontpage fail would be really bad. You want the rest of the page to render.

This happened to us recently, a part of the frontpage failed. But instead of killing the whole page, only the failing feature failed. Here's how it looks when you enter the page

and below you have the failing functionality - in this case it was not even visible to the user until they clicked the "sist sent" (recently aired) tab. Crashing our whole page because of this would be plain out stupid.

This is just as much about isolating features as isolating failures. By isolating features we can start applying several patterns which will help in creating a responsive and well behaved page:

Circuit breaker for each feature/system. If the system is down, deliver what you got.
Timeouts per feature - if the most popular items feature will exceed your response time requirements, treat it as a fail and deliver what you got. That's better than building the request queue with waiting clients.
Cache - we can cache each part by itself. If we know when feature A and B changes but not C - we can pre-compute A and B and update the cache, or at least invalidate it. C can then have a less agressive cache setting.

So, by isolating your features you should be able to deliver some value to the user even if a part of your system is in error.

Introducing DoubleCache

Harald Schult Ulriksen — Wed, 06 Jan 2016 06:15:47 GMT

DoubleCache, https://github.com/AurumAS/DoubleCache, is my own open source implementation of a layered distributed cache, it builds upon solid projects like Redis, StackExchange.Redis and MsgPack and combines these with a local cache implementation on the .Net stack.

Some history

We're already running a similar solution with my current customer, NRK. Our motivation for creating this, came after migrating from Azure Managed Cache to Azure Redis Cache. Azure Managed cache has the nice feature of a local cache on the client, in front of the Managed cache. This is not available when using Redis, as Microsoft has not created their own client, instead they recommend using the (excellent) StackExchange.Redis client. When we moved to Redis, our CPU went haywire as we did a LOT of cache requests (often way too many, it's hard to notice when you hit local memory) and used BinaryFormatter for serialization. Besides cleaning up our data access, we needed our own layered cache. Our layered implementation over StackExchange.Redis is quite OK, however it is a bit tangled with old interfaces and not something which can be easily reused by others. DoubleCache aims to fix this.

My goals when creating DoubleCache

I created DoubleCache for a meetup where I talked about the penalties of moving to a remote cache, and I needed some code to highlight the problems with remote caches. As I wanted a clean implementation and other open source projects were too messy, I decided to implement my own with the following criteria

It should be possible to use as a local, remote or local and remote cache - with or without sync, through a single interface. Changing from a local to a remote cache should not require any changes in the client using it besides swapping interface implementation.
Each implementation should in it self be trivial
Follow the cache aside pattern
Extending the functionality should not require any modifications to existing imeplementations. Using standard patterns such as decorator ought to do it.
Serialization must be pluggable.

Usage

Add a reference to DoubleCache using nuget Install-Package DoubleCache and initialize the DoubleCache with a remote and a local cache.

var connection = ConnectionMultiplexer.Connect("localhost");
var serializer = new MsgPackItemSerializer();

_pubSubCache = CacheFactory.CreatePubSubDoubleCache(connection, serializer);

To use the cache call the GetAsync method. This method takes a Func called dataRetriever. This method should call your repository or other service. The dataRetriever method executes if the requested key does not exist in the cache, adding the result to the cache.

var cacheKey = Request.RequestUri.PathAndQuery;

pubSubCache.GetAsync(cacheKey, () => _repo.GetSingleDummyUser()));

Implementation

The ICacheAside interface is the main part of DoubleCache, all variants relies on implementations of this single interface.

    public interface ICacheAside
    {
        void Add(string key, T item);
        void Add(string key, T item, TimeSpan? timeToLive);

        Task GetAsync(string key, Func> dataRetriever) where T : class;
        Task GetAsync(string key, Func> dataRetriever, TimeSpan? timeToLive) where T : class;

        Task GetAsync(string key, Type type, Func dataRetriever);
        Task GetAsync(string key, Type type, Func dataRetriever, TimeSpan? timeToLive);
    }

The Add method is implemented with fire and forget, hence it does not need to be Async as this is handled by the Stackexchange.Redis client.

DoubleCache comes with the following implementations of this interface

LocalCache.MemCache - using System.Runtime.Memory
Redis.RedisCache - using StackExchange.Redis client
DoubleCache - a decorator wrapping a local and a remote cache
PublishingCache - a decorator publishing cache changes
SubscribingCache - a decorator supporting push notifications of cache updates

As seen in the usage example, combining the decorators and implementations provide a high level of flexibility. It comes at a little mental cost when wiring up the DoubleCache constructor. This can be relieved to hide the constructor behind a factory method.

LocalCache.MemCache

This uses the System.Runtime.Caching.MemoryCache.Default instance throughout the implementation.

A word of warning: These are mutable objects stored in memory. Be careful if you modify items retrieved from the cache. There can be only one.

Redis.RedisCache

A wrapper around StackExchange.Redis. Since the IConnectionMultiplexer should be a single instance, you will need to pass the Redis database in the constructor. The StringSet call to Redis use the FireAndForget option.

DoubleCache

This is a decorator used to wire the local and remote cache together.

To achieve this, it simply calles the local cache first and wraps the remote cache in the dataRetriever function.

return _localCache.GetAsync(key, type, () => _remoteCache.GetAsync(key, type, dataRetriever), timeToLive);

While DoubleCache will keep the remote cache in sync with any add calls made on the local cache it owns, syncing the local cache with remote cache changes triggered by other clients is not the responsibility of the DoubleCache wrapper. This is covered by the publishing and subscribing cache wrappers.

PublishingCache

A decorator for publishing changes when adding/updating items in the cache. The key and object type is published using the ICachePublisher interface, which then passes the message on to Redis Pub/Sub. As each cache implementation is responsible for adding an item to itself when the dataRetriever function is executed, ICachePublisher wraps this method causing a cache publish after it is executed.

public Task GetAsync(string key, Func> dataRetriever) where T : class
{
    return  _cache.GetAsync(key, async() => {
        var result = await dataRetriever.Invoke();
        _cachePublisher.NotifyUpdate(key, result.GetType().AssemblyQualifiedName);
        return result;
    });
}

The publishing cache is intended to wrap the remote/central cache implementation.

Subscribing cache

This decorator wraps a cache and through the implementation of ICacheSubscriber it will call Add on the wrapped cache when the ICacheSubscriber.CacheUpdate event is fired. The RedisSubscriber implementation of ICacheSubscriber will also need a reference to the remote cache, as the Pub/Sub message does not contain the cache item itself, only the key and the type.

The Subscribing cache decorator is intended to wrap the local cache.

Serialization

As mentioned in in my post on cache speed, there are a lot of options when it comes to serializers. By default, DoubleCache comes with BinaryFormatter and MsgPack.

If your data-types support it, I highly recommend something other than BinaryFormatter.

To implement another serializer you will need to implement the IItemSerializer interface

    public interface IItemSerializer
    {
        byte[] Serialize(T item);

        T Deserialize(byte[] bytes);
        T Deserialize(Stream stream);
        object Deserialize(byte[] bytes, Type type);
    }

Whats next

I have a few open issues on GitHub which I intend to close:

Passing TimeToLive with pub/sub, or retrieving it from the cache
Creating a factory to make it simpler to get started
Adding delete operations

You'll find DoubleCache over at GitHub https://github.com/AurumAS/DoubleCache

Single digit response times

Harald Schult Ulriksen — Sun, 03 Jan 2016 20:04:40 GMT

Single digit response times are fully possible. And I think we should try to get there, it may not always be possible but that's no reason not to explore our options. A fast site will feel more responsive, and using less resources means that you can handle massive load on less resources.

Caching content is perhaps the single most effective measure. Living in a web world we have three main ways to go

Http caching (think RFC7234 and headers such as max-age, etags and If-Modified-Since)
Output cache, either built-in or products like Varnish
Application cache.

In this post we're going to look at the last of these, the one closest to your stack. This is where you end up when you need to alter the response before you emit data from the server.

Starting out with a small solution, often hosted on a single server, the application cache normally lives in-memory and the objects are kept as is. It's fast and relatively easy, the memory management is often taken care of for you.

After a while your application grows, you have multiple servers and having local caches may no longer be optimal. Problems such as cache sync arise - serving the same user different data just because the user ended up on a different server makes your application feel britle and awkward. Enter the remote cache. These days often implemented using Redis (Redis is really on a roll with both AWS and Microsoft offering Redis as a service).

So, what happens when you switch from a local cache to a remote cache? You get two penalties

Network IO
Serialization

Redis is fast, we're talking about 6-8ms response times with network IO and serialization, but when your goal is single digit response times, that's too much.

So, how can we have both? A layered cache, using both a remote cache and a local cache. This also plays well with this cloud thing where instances pop up when needed and instance memory is not really expensive.

How much memory do you really need to keep your entire site in memory?

To get this to play well we also need to

Keep our local caches in sync
A fast serialization when we hit the remote cache

To keep our local caches in sync we need a pub/sub mechanism, Redis has this built in and makes the implementation quite trivial. It's not a reliable sync, i.e. the messages are published, if a client is disconnected for some reason, it will not get the updated. I find that this is something we normally can live with.

The serialization part of it is also solved for us, there are several fast serialization procotols available to us, such as MsgPack, Protobuf, Bond or Avro. You will find that these protocols have their pros and cons, In .Net land MsgPack does not require class markup, but cannot cope with inheritance. Protobuf can, but requires a serialization definition. Yan Cui has a nice benchmark over at http://theburningmonk.com/benchmarks/.

What ever you do, BinaryFormatter is slow.

So, by creating a layered distributed cache we can achieve fast response times and a synchronized cache. As an added benefit we can use the pub/sub mechanism to push data changes to the cache from other services. This comes in real handy in the microservices world.

DoubleCache is my own implementation of a layered distributed cache for .Net. Using Redis as the remote storage, and System.Runtime.MemoryCache (can easily be switched to HttpCache). I've tried to stick to a single interface, making it possible to switch between a local cache, remote cache, local and remote, or a synced local and remote cache. The implementation follows the cache-aside and decorator patterns.

Availability in an unreliable cloud

Harald Schult Ulriksen — Thu, 10 Dec 2015 19:08:00 GMT

One of the first problems to solve when moving to the cloud is availability, or just plain uptime. Although it seems strange that the cloud should not be stable, but this is one of the constraints that is an enabler for solid solutions. By accepting unexpected service interuptions as a design requirement, the whole solution benefits. You have to design for failure and outages. Think Netflix and their simian army.

Outages in the cloud can be placed in two categories, failures or maintenance, and within maintenance we have planned and unplanned maintenance. To deal with this, and also to meet the SLA requirements, Microsoft has divided Azure into update domains and fault domains. Update domains are related to restarts due to maintenance and upgrades, while fault domains are closely related to HW, such as a rack or power unit https://azure.microsoft.com/nb-no/documentation/articles/virtual-machines-manage-availability/. It is therefore important to have multiple instances, at least one in each fault domain, preferably also one in each update domain. Once you have enough instances you will survive downtime.

Next step on the availability list is to support datacenter outages. To deal with this, one can start using multiple datacenters, potentially from different cloud providers, i.e. use one Azure DS and one Amazon DS. Your ability to move between cloud vendors will normally be limited by the choice of technology and implementation. Architecting for multi-vendor support will require a lot more work when it comes to operations and management. Any SAAS platform must have an equivalent offering with both vendors, not to mention how you will work with the different performance characteristics.

Service overview

Here's a short list of some of the Azure services and how replication works.

By default SQL is replicated to 3 severs in each datacenter, on the premium level a SQL database can be set to active replication. Also, the secondary SQL servers can be used as a read source, however this may hold locks and cause transactions to fail on the master. https://azure.microsoft.com/en-us/documentation/articles/sql-database-geo-replication-overview/#active-geo-replication-capabilities

Service	Comments
Azure storage	Azure storage comes in four different flavors: From https://azure.microsoft.com/nb-no/documentation/articles/storage-redundancy/. Note that Zone Redundant storage is only available for block blobs.
Azure SQL
DocumentDb	Each collection in DcumentDb resides on multiple servers, however there's no built in geo-replication.
Redis	The basic Redis offering is not replicated. At the standard level there's a master and a slave. And in the recently announced Premium tier there's support for Redis clustering, sharding data over multiple nodes. https://azure.microsoft.com/en-us/documentation/articles/cache-how-to-premium-clustering/
Search	Search is divided in replicas and partitions. For read-write and indexing availability, at least 3 replicas are needed. Partitions relates to the storage volume.
Cloud services	The number of instances determines how good you can handle outages. You need at least one in each fault domain, preferably at least instances.
Cloud services/Virtual machines	The number of instances determines how good you can handle outages. You need at least two or more in an availability set/one in each fault domain..

As we see, all of these services provide some sort of replication. Some offer geo redundancy, but in many cases you will not be able to control this in details yourself. From a performance perspective, this landscape changes with each server, as an example can storage be manually sharded while this is an integral part of DocumentDb.

Also, while a few of these services provide geo replication, supporting writes is a completely different story. Especially if we go down the SAAS path, handling writes during outages are not trivial. It may often seem easier to use a PAAS and then run software designed for multiple datacenter usage on top of that instead. I've started writing on a post where we'll go through a sample setup on how we can achieve a solution built with Azure SAAS and still survive a datacenter outage.