Black Friday gained traction here in Norway the last years, and this year was bigger than ever. As Cyber Monday is still very small, most online stores run their offerings on Friday as well. This lead to what can best be called a nationwide outage. Most major online stores went black. They had spent millions on commercials, yet they decided to fail their customers when they came to spend their money.
This is totally unnecessary and I cannot understand how this is even remotely acceptable. As developers and technology leads we need to design for massive traffic and we must run load and performance tests. However, the business owners and managers have their part in this as well. They must make sure the developers prepare for such traffic and are allowed to spend resources on it, and when it fails you need to understand what went wrong and why it will not do that next time. If this is too technical, get someone from the outside to help you and your team.
So, how do we, as developers, make sure this doesn't happen again? There's plenty things to do, besides testing, we have many architectural well known patterns and we have the cloud. Cloud services are perfect for these scenarios and there's plenty of guidance available. However, a one to one replication of an on premise system to the cloud will most likely not solve your problem.
I will spend the next few weeks covering some patterns and practices, books and other resources which are helpful dealing with high traffic and burst. Some of the suggestions will come with tradeoffs, but what's worse? An unresponsive site or a disabled upsell functionality.
My first tip is to read Release it! by Michael T. Nygaard