cloud economics

Based on a discussion with some friends I decided to do a very simple model pitting Amazon Web Services (AWS) against colocation in commercial space with owned gear.  This model makes a few simplifying assumptions, including the fact that managing AWS is on the same order of magnitude of effort as managing your own gear. As someone put it:

You’d be surprised how much time and effort i’ve seen expended project-managing one’s cloud/hosting provider – it is not that different from the effort required for cooking in-house automation and deployment. It’s not like people are physically installing the OS and app stack off CD-ROM anymore, I’d imagine whether you’re automating AMIs/VMDKs or PXE it’s a similar effort.
The results were not surprising to anyone familiar with the term ‘duty cycle.’ Think of it as taking a taxi vs. buying a car to make a trip between San Francisco and Palo Alto. If you only make the trip once a quarter, it is cheaper to take a taxi. If you make the trip every day, then you are better off buying a car. The difference is the duty cycle. If you are running infrastructure with a duty cycle of 100%, it may make sense to run in-house. The model that I used for the evaluation is here.
Note that the pricing is skewed to the very high end for colocation, so the assumptions there are conservative. Levers are in yellow. Comments are welcomed.
I’d like to thank Adam, Dave and Randy for helping me make the model better.
Edit: Some folks are asking for graphs. I thought about adding sensitivity analysis to the model but that would be missing the point. This model presents an analytical framework which you are free to copy and then sharpen up with your own business model and cost structure. Running sensitivity analysis on that will be much more interesting. Added an NPV calculation for some people who asked for it.

21 Responses to cloud economics

  1. […] This post was mentioned on Twitter by vgill and rohit sharma, james blessing. james blessing said: RT @vgill: Cloud economics considered harmful at a 100% duty-cycle: http://wp.me/pvWWU-40 […]

  2. Zed says:

    Why are you comparing colocation against AWS? The right comparison is between AWS and dedicated servers. AWS looses big time against dedicated servers, both on price and performances.

    Check out 100tb.com, fdcservers.net or other leading edge providers. AWS pricing looks downright medieval compared with $200 dedicated servers offering 1 Gbps of bandwidth.

  3. I’ve been noticing a trend lately that aligns with this where I think people have forgotten what the E in Amazon’s EC2 is for. First it was about Reddit (see http://www.reddit.com/r/blog/comments/ctz7c/your_gold_dollars_at_work/c0v8ug8?context=1 and http://news.ycombinator.com/item?id=1549737) and then a few days ago I http://gigaom.com/2010/08/06/amazon-web-host/ The E is for elastic and that seems like the most economical use for it.

    I think part of the draw of the cloud is generated by startups and most of them will probably never see 3 years. Lots of startups got caught overbuilding in the late 90s bubble and I think people learned from that. Where things break down is when you either never transition to something like colocation or you move from colocation without looking at the actual cost because “everyone is doing it.”

    Having said all that, there are some things that AWS gives you that you don’t cover in your spreadsheet and will be harder to cover. Where I think you may run into issue are EBS and ELB. It isn’t much of a stretch to say that 2 FTEs are needed for both AWS and colo but it gets thinner when you factor in needing to understand load balancers and storage systems. You can probably find 2 FTEs that can do that but it will be harder and it is arguable that it will start to take more work to manage those systems than it does with AWS. I wouldn’t mind seeing an updated spreadsheet with one more FTE for managing a SAN/load balancer and costs for that hardware over 3 years as well.

  4. John Todd says:

    Ah, this spreadsheet looks very familiar – I have a copy that looks almost the same in one my older directories, though I put in knobs for Amazon’s pricing calculations – I’m not sure where they start to give price breaks (or if they do, at all.)

    I agree with Carson, in that the “elastic” component is a very appealing component of the EC2 (or any “cloud”) system. Being able to turn up/down single machines as needed is a huge capital expenditure burden that vanishes from the books, and hopefully the number of systems running then more closely matches income generated by those CPU cycles rather than just serving to warm up a big windowless building somewhere.

    But of course, you countered the “elastic” concept with your description of “100% duty cycle”, which some services may in fact encounter. I’d just caution someone doing this calculation set against assuming they fit the model you’ve described. Even a 40% reduction in traffic for 40% of the day (not unreasonable in services that cater to specific geographies) would start to make EC2 look a bit more competitive.

    I do agree that for sizable installations that see steady load, that a self-operated data center makes the most sense, but the devil is in the details on any calculations.

    JT

  5. Director of Punt says:

    Oh AWS!

    AWS makes sense for small shops who don’t have the time or expertise to build their own hosting or datacenter operation. At some point, certain organizations will reach a breaking point where operating with AWS is not conducive for their operations. If you are experiencing congestion to some point to the Internet, how do you escalate that issue at AWS without purchasing a pricey support contract? How do you deal with the emails from AWS informing you that your instance is going to be bounced?

    Before folks drink the kool-aide of AWS, remember a few things:

    1: They do not eat their own dogfood. Amazon is a customer of Limelight and Akamai for CDN services. Yet at the same time, they tout their Cloudfront (Beta) (no, I’m not joking its actually beta) service as a viable CDN. Sure it works, but if their own company doesn’t use it there is probably a reason why.

    2: Elastic Load Balancing is worthless. How can they tout their “scalable” ELB when they use big-metal load balancers for the rest of their services?

    3: Horrible network footprint. It seems like 70% of AWS is operated out of the Washington DC area. They have other nodes (SF, Seattle, Singapore, Dublin), but their bandwidth prices are not cost effective resulting in most people sticking with US East (DC). So if you peer with them in Seattle or SF, they don’t send you their DC routes which contain the majority of AWS. I guess this is what happens when your backbone is comprised of GRE tunnels running over your IP transit providers.

    4: Waste of IP address space. Each EC2 instance gets a public IP address. This is really not Internet friendly. While their VPC service is a way to have your own private domain and private IP’s, the IPSEC tunnel options are quite limited and here’s the shocker: its all located in DC yet again.

  6. tariq says:

    P(company exists for at least 3 years) * Cost of Cloud = ?

  7. guilespi says:

    EC2 should be compared against leasing not against CAPEX, you’re saying your dead up-front money on servers has no finnancial cost?

    wrong.

    • vijaygill says:

      Added an NPV, use the cost of money as a lever. I didn’t use leasing because CAPEX is easier to calculate directly. Again this is a framework, use the little boxes in yellow to plug in your assumptions.

  8. David Patterson says:

    There are 2 flaws with this analysis:

    1. If you’re going to assume 100% utilization, the right comparison is with AWS reserved instances, which costs about $0.45 per hour (amortized annual fee + usage fee) versus $0.68 per hour for on demand-instances

    2. You’re assuming 100% utilization in this analysis. As James Hamilton says, lots of datacenters run an average utilization over a whole year at less than 30%.

    Another way to calculate this is what is the utilization you need come out ahead? Using your numbers, the answer is 60%. Its rare to be able to sustain that for 24*365.

    Finally, even if you could, many of us take advantage of the E, running 1000 servers for 10 hours rather than 10 servers for 1000 hours. Getting the answer 100X faster is real value.

    Dave

  9. Matthew Moyle-Croft says:

    So, my take from this analysis isn’t about “flaws” or trying to attack the assumptions but taking the lessions from it:

    -> if you’re one of the smart kids and can craft your systems to generate 100% “duty cycle” then you’re smart enough to build your own systems and run them efficiently. (Google)

    -> if you’re not one of the smart kids and/or your load is not big enough or smooth enough to generate 100% duty cycle then you’re better off using someone else’s cloud/infrastructure. (most people, including enterprise)

    -> if you’re in between then you should build and operate the bits you can do 100% duty cycle and run in the cloud the bits you can’t. (Amazon in examples above).

    I’m guessing we just need to acknowledge which one we are and not pretend we can’t out analyse vgill 🙂

    MMC

    • Matthew Moyle-Croft says:

      Lessions = Lessions obviously.

      Obviously the cloud is about acknowledging you can’t do 100% duty cycle and that you need someone else to aggregate demand together to gain that efficiency.

  10. Yinal Ozkan says:

    I am not okay with your west coast analogy. (I agree that the west and the “interstate highway” culture has always been in the forefront of private car expansion)

    If you keep using a taxi or a car, you will keep bleeding money. Both models will not survive.

    Today, Amazon’s EC2 may looks a like an expensive taxi service, but the future lies in public transportation (a.k.a. shared services).

    Instead of a taxi or a car you can utilize a bus or a train service. Your costs will actually be lower.

    Amazon currently charges a premium for a shared infrastructure (taxi). In the near future the cost models will include the “operational” costs.

    I strongly recommend visiting Google’s enterprise Gmail presentations for a different scenario..Instead of driving your own car (managing your own datacenter) or hiring a limo (taxi with a driver), you will be better off sitting on a comfy chair on a high speed train ( yes on the east coast we are not there yet either with Amtrak’s Acela service).

    I ran numbers myself a few times as well, unless the total cost is shared among multiple customers (development, infrastructure, bandwidth, management) , there are a lot of red stripes in all projections. The good part is that it is possible to “load-balance” the cost via sharing it.

    Cheers,
    – yinal ozkan

    • Matthew Moyle-Croft says:

      @yinal

      So, what you’re saying is that if you’re load does not allow 100% duty cycle or you’re not smart enough to deliver that, then you should take your “load” and get someone else who can combine many “loads” to get 100% duty cycle. ie. if you have a duty cycle that is 100% then you should build it yourself?

      Which is a good thing because otherwise no one would build you the cloud infrastructure.

  11. This is interesting and looks similar to results we got recently: IaaS only makes financial sense if your system can take advantage of elasticity.

    We’ve been working on a cost modeling tool that can be used to calculate the costs of IaaS from different providers so you can compare IaaS with alternatives (e.g. buying servers). We looked at a case study of an organization that is considering either buying a dozen servers from Dell or using EC2. Here’s a link to our paper, which has all the details (read Section 4.3 for the results, and 4.4 for a discussion of the results):

    Click to access 1008.1900.pdf

    For the case study that we investigated, it turned out that:
    – If the servers run 24×7 then it makes no financial sense to use the cloud.
    – If the servers are used in an elastic manner then the costs are very similar to buying servers.
    – If the servers are used in an elastic manner and AWS reduce their prices by 15% (like they’ve done before), then it makes financial sense to use the cloud.

    Our results took into account NPV and were based on a mixture of reserved instances (for the 24×7 loads) and on-demand instances for the elastic load.

  12. J Cornejo says:

    I agree with some of the comments here … from a commercial perspective AWS gives me “more” encompasing SLA than co-location alone (i didn’t say higher for obvious reasons)

    AWS costs include an estimation of personnel hours – which certainly equate to hours of highly skilled resources in a dedicated (or even other cloud) model.

    Accenture has published Eli Lilly’s experiments in the cloud and it proved that internal infrastructure vs AWS charged by cycles of use – AWS won (again what is more complex than testing) ?

    Also – 100 % “duty cycle” is unrealistic – unless it is an organization that has 24×7 peaks, if a commercial model has ups and downs and it is charged by cycles of use, then that discrepancy will be savings (plus 100 % duty cycle will need the equivalent amount of highly qualified internal or external resources overseeing the solution(s)) – which will increase capital expense rather than operational expense (unless you are using contractors of course).

  13. I assume you deliberately compare bare-metal v EC2. Or am I the only one missing software costs? I would typically expect at least some cost block for Operating system and maintenance.

    Jan

    • Sander Chandon says:

      For Microsoft OS license you’d see the same kind of calculations as for hardware. In the AWS model there’s a rent component for the license while for colo you’d have to buy them. Over three years the cost of purchase would be lower. But like the rest of the model this only applies for a 100% duty cycle, at a lower cycle the AWS model will be more cost effective at some point.
      For open source OS’s the maintenance (opex) cost would be the same for AWS and colo and can be left out.

  14. Mp says:

    AWS comes with SLA guarantees wrt security, availability etc.

    What about cost of downtime, cost of matching AWS SLAs when you try colocation in commercial space with owned gear?

  15. Very nice analysis. I think it’s hard to avoid the cash-flow implications. Putting money up-front to buy equipment is NOT good for cash-flow, and paying a month after you use your infrastructure IS good for cash-flow. In my experience cash flow is what makes a business a success or a failure.

    More here (piece I wrote last week for gigaom’s new cloud computing blog).

    http://cloud.gigaom.com/2010/08/16/how-computing-impacts-the-cash-needs-of-startups/

  16. April Sage says:

    It’s interesting to note the prevalence of transportation-related analogies when referring to the cloud. We sometimes compare the logistical resources of trying to choose the optimal configuration of buses to accommodate transportation needs for a city with computing resources – it would be immensely more efficient to be able to configure a bus with just the number of seats needed for that specific day or event than to have to invest in buses with a fixed number of seats and hope that this year’s analysis serves the logistical picture in a few year’s time.

    http://resource.onlinetech.com/what-are-the-benefits-of-virtual-private-cloud-computing-hosting/

    When comparing ROI though, it seems like a big piece of the picture is omitted. A raw power comparison doesn’t reflect the type of savings that a fully redundant cloud infrastructure can offer in terms of reducing or eliminating down time of hardware and software maintenance or repairs, for example. Any company that requires high-availability needs to analyze the cost/benefit of what cloud computing can offer in terms of increasing reliability when distributed over servers and storage systems capable of shifting resources on the fly without disruption.

Leave a reply to J Cornejo Cancel reply