Risk Assessment

A provider had an outage today. Nothing new. Outages happen.  What is surprising is the blamestorming from the companies that depend on the provider. Folks, if your business is down and you can’t survive because your infrastructure provider has had a problem, this is your fault.  There is a cost to redundancy, and if the cost of redundancy is greater than the expected impact of the outages for any period of time, then you don’t make your system redundant. You also lose the right to complain about the impact on your business. To make it clear, because people seem to have a hard time understanding this, I’ve built a simple model that can be used to evaluate the various scenarios.

I hope this helps people make an informed risk/benefit trade-off. The reduced noise on twitter will be a useful side benefit.

For folks who really want to dig into these arguments, please seen Ben Black on SLA’s here and here.


7 Responses to Risk Assessment

  1. JZP says:

    “The reduced noise on twitter”? Come on, even the good parts of twitter are noise, man. 🙂

  2. Director of Front says:

    The problem is that the general populace does not comprehend redundancy nor the ability to properly calculate risk.

    With the huge increase in hosting in general, most customers have turned a blind eye into performing due diligence in their vendors.

    How many customers of major hosting services ask the right questions? How many dig into asking questions if their provider has actual, SRLG-diverse connectivity through transit providers who also maintain diversity? The answer is few do and those who do are given broad, generic and unreliable information.

    Times have changed in the industry dramatically. Back in the day, it was easy to ascertain the risks associated with selecting a provider. Now, with many hosting companies and IP transit providers, it gives the illusion of diversity and choice. The reality is that most everything is riding on the same infrastructure and just by saying you are multi-homed essentially means nothing.

    While some may not want to hear it, some customers best solution is to simply stick with a very small selection of providers who do have their act together. But its not going to be cheap and you need the proper staff and skills to evaluate and sift through the marketing & hype.

    In a day with IP transit on the dollar menu, most people just don’t care. As a result, enjoy your fail.

  3. alumiere says:

    Your post and the posts you linked are all good. But there’s one thing I haven’t seen you mention at all, which is that at least in the US virtually all providers and ISPs are using the same long-haul physical infrastructure (ie: Level3 owned fiber, formerly Williams Communications Group). So unless the hosting companies have satellite service the sites are going to go down when Level3 has a major outage or two simultaneous cuts regardless of how much diversity they have. And trans-Atlantic traffic is as bad if not worse – unless something has changed very recently, there are two sets of physical cables, with service over said cables being offered by a plethora of providers.

  4. Martin Barry says:

    Surely the risk assessment has to cover “worst case scenario” and hence the cost of redundancy could also be seen as an insurance policy. The possible scenarios and the associated probabilities would be revealed by any half decent disaster recovery plan which every business should have. Oh, wait…

  5. R Kotwani says:

    Maybe I don’t understand. The determining factor, on whether to build redundancy or not, is over simplified in your spreadsheet, IMHO.

    From a customer and SP point of view not having HA or redundancy can have severe implications on SPs business. What about the loss of revenue, losing customers, new customer acquisition costs expenses, costs associated with decommissioning services for a given customer, can have a huge impact on the SP’s bottom line as well.

    I’m sure most SPs build their ROI model based on the certain unknowns (risks). But I think fundamental to any business is customer retention is having the proper SLAs with customers identifying single points of failure, and customer having the ability to understand potential risks in case of service outage.

    • vijaygill says:

      That is correct. Each business has their own calculus for how to come up with a figure that encompasses what you are talking about. Once they come up with that, they can just plug it in as a line item in the lever under cost to business.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: