In an earlier post I mentioned that “cloud is software.” Thinking about it some more, I believe the statement can be generalized to “Infrastructure is software.” This is a bit different from how people have traditionally viewed it – Internet infrastructure is viewed as pipes, disks, CPUs, data centers. The collection of items that form the physical units that provide pipe, storage, compute and the buildings that house them. My thesis is that those are necessary but not sufficient to be considered infrastructure. Those elements in and of themselves, are just so much sunk capital – to make efficient use of them you need the correct provisioning APIs, monitoring, billing, and software primitives that abstract away the underlying systems, allowing a decoupling between the various technological and business imperatives so that each layer can evolve independently based on their different technological scaling domains (within reason – if you are writing ultra-high performance code, you will know the difference if you get instantiated on an Opteron vs. a Nehalem cluster).
Lets make this concrete and think about how the above can inform the building and operations of a global service provider that has a large network, with datacenters that are used for a cloud computing business. A large telecommunications company for example that wants to provide enterprise cloud computing among a suite of services.
All things come down to the fundamental problem of mapping demand onto a set of lower level constraints. For a telecom company, constraints at the lowest level consist of:
- Fiber topology (or path/Right of Ways)
- Forwarding capacity
- Power & Space
- Follow The Money (FTM)
Everything thing else is an abstraction of the above constraints. That is the good news. The bad news: everyone has the same constraints. No special routers available to you and not to others, the speed of light is constant (modulo fiber refractive index in your physical plant), So how do you differentiate yourself? Fortunately, those are also simple:
- Cost (note I did not use price for a reason)
- Open Networks
- Rich connectivity
Latency has been well documented. Some excerpts from Velocity 2009:
Eric Schurman (Bing) and Jake Brutlag (Google Search) co-presented results from latency experiments conducted independently on each site. Bing found that a 2 second slowdown changed queries/user by -1.8% and revenue/user by -4.3%. Google Search found that a 400 millisecond delay resulted in a -0.59% change in searches/user. What’s more, even after the delay was removed, these users still had -0.21% fewer searches, indicating that a slower user experience affects long term behavior. (video, slides)
Phil Dixon, from Shopzilla, had the most takeaway statistics about the impact of performance on the bottom line. A year-long performance redesign resulted in a 5 second speed up (from ~7 seconds to ~2 seconds). This resulted in a 25% increase in page views, a 7-12% increase in revenue, and a 50% reduction in hardware. This last point shows the win-win of performance improvements, increasing revenue while driving down operating costs. (video, slides)
If you want to get into the cloud computing business, you will have to build your network and interconnection strategy to minimize latency. Your customers bottom line is at stake here, and by extension, so is your datacenter divisions P&L.
Sean Doran wrote “People that survive will be able to build a network at the lowest cost commensurate with their SLA.” He forgot to add – in a competitive market. Assuming you are going up against competition, this should be fairly self-obvious: Efficiency and razor thin margins. The killer App is bandwidth, and this means people need to emulate Walmart ™. Learn to survive on 10% or lower margins. At those margins, your OSS/NMS are competitive advantages. Every manual touch point in the business, every support call for a delayed order, failure in provisioning, every salesperson that sells a service that can’t be provisioned properly, nibbles at the margin. Software that can provision the network, enable fast turn up, proper accounting and auditing is the key.
And we react with great caution to suggestions that our poor businesses can be restored to satisfactory profitability by major capital expenditures. (The projections will be dazzling – the advocates will be sincere – but, in the end, major additional investment in a terrible industry usually is about as rewarding as struggling in quicksand.)
Efficiency also means fewer operational Issues. Couple ever increasing number of elements with ever growing mass of policy and you now are starting to lose any semblance of troubleshooting and operational simplicity. Does the network pass the 3 AM on-call test? More policy means more forwarding complexity, and that means more cost that hits your bottom line. A more insidious effect of intelligent, complex networks is that they inhibit experimentation. The theory of Real Options points out that experimentation is valuable when market uncertainty is high. Therefore, designing an architecture that fosters experimentation at the edge creates potential for greater value than centralized administration, because distributed structures promotes innovation and enables experimentation at low cost. This means that by putting the intelligence in the applications, rather than the network is a better use of capital – because otherwise, applications that don’t need that robustness will end up paying for it, and this will end up making experimentation expensive.
Open networks strikes fear into the heart of service providers everywhere. If you are in a commodity business, how differentiate yourself? How about providing service that works well, cheaply. But wait a minute! Whatever happened to “climb up the value chain?” The answer is nothing. You have to decide what business you are in. Moving up the value chain and providing ever higher-touch services are in direct conflict with providing low cost bulk bandwidth. Pick businesses that require either massive horizontal scaling or deep vertical scaling. Picking both leaves you vulnerable to more narrowly focused competitors in each segment. If horizontal scaling is central to one business, trying to fit an orthogonal model also as a core business will end up annoying everyone and serving no one well. However, if the software interface to the horizontal business is exposed to the vertical high-touch side of the business, both can be decoupled from each other and allowed to scale independently. This means things like provisioning, SLA reporting, billing, usage reporting all exposed via software mechanisms.
Let me start off by saying content is not king.
Gaming companies are making the same mistakes as the
content guys. They always over-estimate the importance of
the content and vastly underestimate the desire of users/people
to communicate with each other and share…
The Internet is a network of networks. The real value of a network is realized when it connects to other networks, more detail can be found in Metcalfe’sLaw, and Reed’s Law. Making interconnections with other networks harder than is necessary will eventually result in isolation and a drive to irrelevance (in an open market). If people who are transiting your network to get to another network find that the interconnection between your network and their destination network is chronically congested or adds significant latency, the incentive to directly interconnect with the destination network or find another upstream becomes stronger.
It ain’t the metal, it ain’t the glass; it’s the wetware.
Make the network be database authoritative. This will allow for faster provisioning, consistency, auditing. You can tell authoritatively if two buildings across the country or the world are on-net and more importantly, if they can be connected together in what timeframe. This is especially true if you have a few acqusitions with a mixture of assets. Just mashing together the list of buildings that are now on-net with the merged entity doesn’t actually tell you if they can be connected together easily or through several different fiber runs, patch panels, and networks. If the provisioning systems were correct, the sales folks could tell prospective customers when services could be delivered because they’d know if connecting two buildings involved ordering cross-connects or if it involved doing a fiber build. We provision thousands of machines automatically, why treat thousands of routers differently? The systems that automatically provision and scale your network are hard to implement, but they can be built. It only requires the force of will to make it happen.
All these things give a better quality of service to the end user and are a competitive advantage in reducing OPEX and SLA payouts due to error in configurations. You can futher extend your systems to do things like automatic rollbacks if you make a change and something goes wrong.
Software is the key, no matter what your business is if it deals with the internet and it will be increasingly true going forward.