Core Business

September 16, 2010

We see continuous growth in managed services and we are confident that we can help Vodafone free up resources to focus even more on their core business and innovation

Can someone explain what Vodafone’s core business is?


Innovation and Outsourcing

October 13, 2009

Risk:

The CEO of Air New Zealand had this to say on their supplier:

“We were left high and dry and this is simply unacceptable. My expectations of IBM were far higher than the amateur results that were delivered yesterday, and I have been left with no option but to ask the IT team to review the full range of options available to us to ensure we have an IT supplier whom we have confidence in and one who understands and is fully committed to our business and the needs of our customers.”

Reward:

Fake Steve Jobs had this to say:

See, those outsourcing deals always sounded so good: Why do you want to run a messy old data center anyway? We can do it for less than it costs you to do it yourself, and you can focus on your real core competence, which is running an airline.
Except, um, no. An airline’s core competence is running computers. I mean, think about it. Duh

Thing is, these guys did think about it. They knew the deal, but they did it anyway. You know why? Because they got to take a bunch of assets off their balance sheet and send a few hundred IT employees to IBM. It was an accounting maneuver, a way to dress up their financial reports, and it was especially appealing to weak companies. IBM takes your data center off your hands — and in some cases even pays you some money — and then sells it back to you as a service over the next decade.

If you are outsourcing, your cost advantage is lost, and not only is your cost advantage going to go away, there are some things that you are never going to be able to do. One can argue that it would make the most sense for someone like Google to focus on their core competency, not waste time building servers.  But not only are they building servers, the fact that they viewed it as a core competency allowed them to make things better by optimizing the system, including on-board batteries which enabled datacenters without centralized UPS’s.

People define core competencies far too narrowly. It is not simply that someone chose to view building servers as a core competency, it is that they saw the massive advantage to all their efforts of controlling their infrastructure destiny as an enabler and thus took it as a core competency.

Those leaps of innovation are just not going to happen if you are focusing on your “core competencies” while letting others build your infrastructure. It can be argued that at Google’s scale, servers are a core competency – for example no one is going to argue that if you need a 1000 servers, you are better off  using a reverse auction, but if you are a global service provider, you are not building 1000 servers, you are in fact, working on your core competency, a point which does not seem as clear as it perhaps may appear.  How are you going to avoid being a dumb pipe if you can’t even control your own infrastructure at scale?

Edit: Benjamin Black added clarification


Peering Policy Analysis

September 8, 2009

Peering or Settlement-free Interconnect (SFI), is a contentious subject as can be seen here and here. Having been involved in a few SFI negotiations and disputes I thought it might be instructive to  use an existing SFI policy as a vehicle for analysis.

First, what is SFI? Simply put it is the bilateral exchange of  two service provider (SP)’s customer routes without payment by either side (settlement-free). A more detailed explanation can be found here.

The technical details and various modes of the peering definition could go on for quite some time, but the question at the heart of the matter is: “will provider X interconnect with me on a settlement-free basis?” Network Service Providers want to connect to other networks on a settlement-free basis because it allows them to exchange traffic for free with them, without having to pay an upstream to carry their traffic. The upstream providers do not want to interconnect on a settlement-free basis because they lose revenue.

Geoff Huston has a very good statement of what settlement-free interconnection really means.

The bottom line is that a true peer relationship is based on the supposition that either party can terminate the interconnection relationship and that the other party does not consider such an action a competitively hostile act. If one party has a high reliance on the interconnection arrangement and the other does not, then the most stable business outcome is that this reliance is expressed in terms of a service contract with the other party, and a provider/client relationship is established.

Like taking margin in the retail industry, SFI will only be granted if the benefits of interconnection outweigh the cost. It really is that simple.

With that in mind, let us take a current SFI Policy and analyze the technical aspects. To ground the discussion in reality, I will use the Comcast SFI Policy as of September 2009. It is a good example of a well-written, modern SFI policy. Comcast policy text is in blue.

Applicant must operate a US-wide IP backbone whose links are primarily 10 Gbps or greater.

This is to ensure that the applicant’s network is similar to Comcast in size and has a similar cost basis. Traffic engineering and management are simplified due to similar bandwidth on the interconnecting backbones as traffic flows tend to be of similar size. There have been people who have interconnected at 10G with the backhaul restricted to STM-1/OC-3 links, causing saturation and a poor user experience.

Applicant must meet Comcast at a minimum of four mutually agreeable geographically diverse points in the US. Interconnection points must include at least one city on the US east coast, one in the central region, and one on the US west coast, and must currently be chosen from Comcast peering points in the following list of metropolitan areas: New York City/Newark NJ, Ashburn, Atlanta, Miami, Chicago, Denver, Dallas, Los Angeles, Palo Alto/San Jose, and Seattle.

This clause ensures that the applicants network is similar to Comcast in scope (and has a similar cost basis) and has the same redundancy, size, and diversity of connection that allows Comcast to easily integrate the interconnection and session management into their traffic engineering and operational procedures.

Applicant’s traffic to/from the Comcast network must be on-net only and must amount to at least 7 Gbps peak in the dominant direction. Interconnection bandwidth must be at least 10 Gbps at each interconnection point.

This requirement ensures that the network is at par with other SFI networks, making traffic engineering and operational management easier.  It should be subject to change regularly based on network evolution. The only thing I would change in the requirement is to substitute average for peak. With peak and 95th percentile a small number of samples dominate the calculation.  With average, that is not the case. Peak and 95th percentile are relatively easy to game, not so with average. Any metric that allows dominance of the outcome by a small set of samples is contraindicated in peering calculations, whereas in customer/provider relationships they are preferred by providers. The former situation is optimized for volume and the latter is optimized for rate.

A network (ASN) that is a customer of a Comcast network for any dedicated IP services may not simultaneously be a settlement-free network peer.

This requirement has caused more confusion than any other clause to my knowledge. Most people interpret this to mean “once a customer, always a customer, with no possibility of getting SFI in the future.”  This is quite incorrect. What it actually means is that if you are a customer, you cannot simultaneously interconnect for free for on-net routes. This comes up when customers want only to pay for “off-net” traffic and is implemented by the provider by setting up multiple interconnections.  Announce customer routes (the on-net traffic) on some interconnections and only announce  peer (or off-net) routes on others. If the provider offers this option there are many ways to game it. This requirement is self-defense and eliminates operational complexity.

Applicant must have a professionally managed 24×7 NOC and agree to repair or otherwise remedy any problems within a reasonable timeframe. Applicant must also agree to actively cooperate to resolve security incidents, denial of service attacks, and other operational problems.

Applicant must maintain responsive abuse contacts for reporting and dealing with UCE (Unsolicited Commercial Email), technical contact information for capacity planning and provisioning and administrative contacts for all legal notices.

This requirement ensures that there is a good point of contact that is reachable at any time, considerably simplifying technical and policy coordination between networks.

Applicant must agree to participate in joint capacity reviews at pre-set intervals and work towards timely augments as identified.

Traffic forecasting and pre-planning for capital expenditutures, metro and PoP upgrades is essential as they take time to get deployed in the field.

Applicant must maintain a traffic scale between its network and Comcast that enables a general balance of inbound versus outbound traffic. The network cost burden for carrying traffic between networks shall be similar to justify SFI.

This  is another very controversial requirement – the so-called ‘Ratio clause.’  The best way to look at it is via the Geoff Huston definition above, any other way of looking at this is doomed to failure. This requirement serves as another way to ensure that the interconnection applicant has a similar scale and scope network as Comcast, with a similar cost basis as measured by the cost of carriage of a bit/mile.

Applicant must abide by the following routing policy:
Applicant must use the same peering AS at each US interconnection point and must announce a consistent set of routes at each point, unless otherwise mutually agreed.

Consistent route announcements are useful to prevent gaming (see ratio requirement mentioned earlier), help in troubleshooting and traffic engineering.

No transit or third party routes are to be announced; all routes exchanged must be Applicant’s and Applicant’s customers’ routes.

If a network starts announcing transit or third party routes, those prefixes will interfere with normal routing and traffic engineering, potentially severely disrupting Internet connectivity for customers. Sending a large amount of transit routes can also potentially double or triple the number of paths in the routers, causing them to run out of resources and crash.

Applicant must filter route announcements from their customers by prefix.

Customer routes are preferred in most networks, and are announced to other SFI networks as the best path to reach that customer. If the customer makes an error such as leaking another providers upstream routes, it can cause significant disruption. For example, by making the customer look like it has the the best route to that upstream provider. The wrong information may be propagated to Comcast and their SFI networks, causing traffic to to be incorrectly routed.

Neither party shall abuse the SFI network peering relationship by engaging in activities such as, but not limited to: pointing a default route at the other or otherwise forwarding traffic for destinations not explicitly advertised, resetting next-hop, selling or giving next-hop to others.
Applicant should be willing to enter into an NDA before formal discussions begin.


The abuse requirement simply says do not try to steal service by pointing a default, or faking next-hops.  The NDA requirement is quite standard when entering into negotiations for something as sensitive as SFI.

Applicant should be advised that the SFI processes will start with a 90 day trial.  On successful completion of that trial, a formal interconnect agreement will be processed.  This agreement will renew annually, subject to the then current SFI Policy.  During the year if there is a violation of the policy, the agreement and interconnections may be terminated upon written notice to the contacts specified in the agreement.
A 90 Day trial to verify that the traffic, ratio and other technical conditions are satisfied is reasonable. It allows for sufficient time to verify the claims for volume and ratio, but is not so long that it starts looking like  a revenue generation mechanism.

Applicant shall not be permitted to offer or sell any IP transit services providing only AS7922.

This particular requirement prevents networks that meet the SFI requirements from selling cheap, direct access to the Comcast network to networks who otherwise do not meet Comcast SFI requirements.  This violates the equivalent cost basis argument for SFI.

Applicant must be financially stable.
Comcast requires that Applicants seeking SFI in the United States agree to provide reciprocal SFI arrangement with Comcast in the Applicant’s home market.

Excellent clauses. Comcast is US centric (for now). If they ever expand out to different geographies, there is a ready-made interconnection system in place.

This is a good, rigorous policy that sets out a fair, even-handed system of evaluation for SFI with Comcast. The requirements are clear, well articulated and make technical sense and that makes a sensible trade-off between of cost of interconnection and the value to the Comcast customer base.

Article was vastly improved thanks to editing and wordsmithing help from Ben Black.


Femtocells

August 26, 2009

Om Malik wrote an interesting piece on Femtocells and the failures in Fixed Mobile Convergence (FMC).  Quoting from the article:

According to The Wall Street Journal, femtocells aren’t doing terribly well — sales are slow and demand is weak. It’s a classic chicken-and-egg situation. Carriers are waiting for demand to go up, while folks (like me) are waiting for prices — which currently range from $100 to $250 for the device alone, plus a monthly service fee — to come down.

The rest of the article goes into some details as to what the issues are but what jumps out is the phrase “plus a monthly service fee.”  This encapsulates precisely what I believe is wrong in the telecom world -more focus on small incremental revenues instead of looking at what service and value can be provided to make the customers happy.  The mobile industry is one of the industries where 15%-25% of their entire customer base churns out every year. What would it look like if the churn was an order of magnitude less?  Let’s see what the benefits of a femtocell are:

  • Remove load from the spectrum allocation and tower backhaul (scarce resources)
  • Improve the customer experience
  • Possibly reduce tower density (and associated cost with rental, power, backhaul)

For all this, you expect the customer to pay you to put a femtocell in their house? How about offering customers a discount for calls made via femtocell?

Now comes the delicate balancing act of figuring out who pays for the femtocell?  One option is to have customers buy them outright. Another one is to sell a discounted version, but extend the contract.  Asking for a monthly payment when the customer who is buying the device is unhappy with the coverage is just adding insult to injury.


Infrastructure is software

July 22, 2009

In an earlier post I mentioned that “cloud is software.”  Thinking about it some more, I believe the statement can be generalized to “Infrastructure is software.”  This is a bit different from how people have traditionally viewed it – Internet infrastructure is viewed as pipes, disks, CPUs, data centers. The collection of items that form the physical units that provide pipe, storage, compute and the buildings that house them. My thesis is that those are necessary but not sufficient to be considered infrastructure.  Those elements in and of themselves, are just so much sunk capital – to make efficient use of them you need the correct provisioning APIs, monitoring, billing, and software primitives that abstract away the underlying systems, allowing a decoupling between the various technological and business imperatives so that each layer can evolve independently based on their different technological scaling domains (within reason – if you are writing ultra-high performance code, you will know the difference if you get instantiated on an Opteron vs. a Nehalem cluster).

Lets make this concrete and think about how the above can inform the building and operations of a global service provider that has a large network, with datacenters that are used for a cloud computing business. A large telecommunications company for example that wants to provide enterprise cloud computing among a suite of services.

Basic Axioms

All things come down to the fundamental problem of mapping demand onto a set of lower level constraints. For a telecom company, constraints at the lowest level consist of:

  1. Fiber topology (or path/Right of Ways)
  2. Forwarding capacity
  3. Power & Space
  4. Follow The Money (FTM)

Everything thing else is an abstraction of the above constraints. That is the good news. The bad news: everyone has the same constraints. No special routers available to you and not to others, the speed of light is constant (modulo fiber refractive index in your physical plant), So how do you differentiate yourself? Fortunately, those are also simple:

  • Latency
  • Cost (note I did not use price for a reason)
  • Open Networks
  • Rich connectivity
  • OSS/NMS

Latency

Latency has been well documented. Some excerpts from Velocity 2009:

Eric Schurman (Bing) and Jake Brutlag (Google Search) co-presented results from latency experiments conducted independently on each site. Bing found that a 2 second slowdown changed queries/user by -1.8% and revenue/user by -4.3%. Google Search found that a 400 millisecond delay resulted in a -0.59% change in searches/user. What’s more, even after the delay was removed, these users still had -0.21% fewer searches, indicating that a slower user experience affects long term behavior. (video, slides)

Phil Dixon, from Shopzilla, had the most takeaway statistics about the impact of performance on the bottom line. A year-long performance redesign resulted in a 5 second speed up (from ~7 seconds to ~2 seconds). This resulted in a 25% increase in page views, a 7-12% increase in revenue, and a 50% reduction in hardware. This last point shows the win-win of performance improvements, increasing revenue while driving down operating costs. (video, slides)

If you want to get into the cloud computing business, you will have to build your network and interconnection strategy to minimize latency. Your customers bottom line is at stake here, and by extension, so is your datacenter divisions P&L.

Cost

Sean Doran wrote “People that survive will be able to build a network at the lowest cost commensurate with their SLA.” He forgot to add – in a competitive market.  Assuming you are going up against competition, this should be fairly self-obvious: Efficiency and razor thin margins.  The killer App is bandwidth, and this means people need to emulate  Walmart ™. Learn to survive  on 10%  or lower margins. At those margins, your OSS/NMS are competitive advantages.  Every manual touch point in the business, every support call for a delayed order, failure in provisioning,  every salesperson that sells a service that can’t be provisioned properly, nibbles at the margin. Software that can provision the network,  enable fast turn up, proper accounting and auditing is the key.

And we react with great caution to suggestions that our poor businesses can be restored to satisfactory profitability by major capital expenditures.  (The projections will be dazzling – the advocates will be sincere – but, in the end, major additional investment in a terrible industry usually is about as rewarding as struggling in quicksand.)
-Warren Buffet

Efficiency also means fewer operational Issues. Couple ever increasing number of elements with ever growing mass of policy and you now are starting to lose any semblance of troubleshooting and operational simplicity. Does the network pass the 3 AM on-call test? More policy means more forwarding complexity, and that means more cost that hits your bottom line. A more insidious effect of intelligent, complex networks is that they inhibit experimentation. The theory of Real Options points out that experimentation is valuable when market uncertainty is high. Therefore, designing an architecture that fosters experimentation at the edge creates potential for greater value than centralized administration, because distributed structures promotes innovation and enables experimentation at low cost. This means that by putting the intelligence in the applications, rather than the network is a better use of capital – because otherwise, applications that don’t need that robustness will end up paying for it, and this will end up making experimentation expensive.

Open Networks

Open networks strikes fear into the heart of service providers everywhere.  If you are in a commodity business, how differentiate yourself?  How about providing service that works well, cheaply.  But wait a minute!  Whatever happened  to “climb up the value chain?” The answer is nothing. You have to decide what business you are in.  Moving up the value chain and providing ever higher-touch services are in direct conflict with providing low cost bulk bandwidth.  Pick businesses that require either massive horizontal scaling or deep vertical scaling. Picking both leaves you vulnerable to more narrowly focused competitors in each segment. If horizontal scaling is central to one business, trying to fit an orthogonal model also as a core business will end up annoying everyone and serving no one well.  However, if the software interface to the horizontal business is exposed to the vertical high-touch side of the business, both can be decoupled from each other and allowed to scale independently.  This means things like provisioning, SLA reporting, billing, usage reporting all exposed via software mechanisms.

Rich Connectivity

Let me start off by saying content is not king.

Gaming companies are making the same mistakes as the
content guys. They always over-estimate the importance of
the content and vastly underestimate the desire of users/people
to communicate with each other and share…
-Joi Ito

The Internet is a network of networks. The real value of a network is realized when it connects to other networks, more detail can be found in  Metcalfe’sLaw, and Reed’s Law.  Making interconnections with other networks harder than is necessary will eventually result in isolation and a drive to irrelevance (in an open market).  If people who are transiting your network to get to another network find that the interconnection between your network and their destination network is chronically congested or adds significant latency, the incentive to directly interconnect with the destination network or find another upstream becomes stronger.

It ain’t the metal, it ain’t the glass; it’s the wetware.
-Tony Li

OSS/NMS

Make the network be database authoritative.  This will allow for faster provisioning, consistency, auditing. You can tell authoritatively if two buildings across the country or the world are on-net and more importantly, if they can be connected together in what timeframe. This is especially true if you have a few acqusitions with a mixture of assets. Just mashing together the list of buildings that are now on-net with the merged entity doesn’t actually tell you if they can be connected together easily or through several different fiber runs, patch panels, and networks.  If the provisioning systems were correct, the sales folks could tell prospective customers when services could be delivered because they’d know if connecting two buildings involved ordering cross-connects or if it involved doing a fiber build. We provision thousands of machines automatically, why treat thousands of routers differently? The systems that automatically provision and scale your network are hard to implement, but they can be built. It only requires the force of will to make it happen.

All these things give a better quality of service to the end user and are a competitive advantage in reducing OPEX and SLA payouts due to error in configurations. You can futher extend your systems to do things like automatic rollbacks if you make a change and something goes wrong.

Software is the key, no matter what your business is if it deals with the internet and it will be increasingly true going forward.


Climbing up the Value Chain

June 2, 2009

There is a lot of fear in Telecom about becoming a “dumb pipe.”  Is this a problem and if so, why. Let us start by looking at revenue to market cap and see if that  provides any insight. I’ve created a real-time Google Spreadsheet that pulls in the current market cap for 6 companies in each sector (Internet vs Telecom) and their revenues for 2008. The implication is that the “over the top” players get approximately a 3x boost in market capitalization by virtue of their business. In other words, if the same multiplier ratio of capitalization to revenue was given to the traditional telecoms companies, their market cap would triple. So what is going on here?

A great article on Light Reading  titled Amazon lessons for telcos features Amazon’s CTO Werner Vogel’s talk about opening up the Amazon platform to third party developers and what lessons telecom companies can learn from the Amazon experience.  This means opening up the telecom platform, letting various people experiment with the system, and leverage the key assets telecom companies have – billing relationships, device details, and location. However, that being said I am going to go ahead and make an assertion that this is actually not in the telecom DNA. It is not that this is impossible, it is just that you can’t get there from here. This observation follows directly from my earlier article titled Lack of Smart Engineers Considered Harmful.  Here is a great quote about this:

“[wireless operators] seem frightened of their own networks, and are heavily dependent on consulting from their suppliers.” – attribution withheld on request

If you are frightened of your own networks, and you are reliant on consulting, you will never be able to make the transition from being a dumb pipe provider.  I can’t fathom how you can claim to run a network that is not a commodity if you can’t even operate it in-house. Giving hundreds of millions of dollars to consulting companies to differentiate you will have only one consequence: Consulting companies richer by hundreds of millions of dollars, while you get generic networks that are late and over budget. There is a lot of value to be squeezed by owning the “deck” on a mobile phone, but as we have seen with the iPhone and Android and the older Palm Treos, the majority of the users of more sophisticated platforms are going to turn the telecom into a bit pipe while they happily use the services on top.  The rate at which Apple and RIM are taking profit share should be a clear indicator that the old model is being cannibalized. I am a telecom guy to my core and even I will admit that getting insanely great people to work at a telecom company, especially software geeks, is next to impossible – and it’s the software, stupid. Of all companies in this space, I believe Comcast gets closest to getting it, hence their hire of some insanely great people who understand systems – Mark Muehl, John Schanz, Kevin M. et al.

So, to sum up:

  1. Stop being afraid of your own networks, take charge. This is not rocket science.
  2. Hire the right people.
  3. Accept you cannot be all things to all people. No matter how good you are, someone will come up with a better application that runs over your pipe.
  4. Focus on getting cost out of the network, cut the organization down (do you really need a director of test?). Automate everything, so you can make a decent margin on the dumb pipe.
  5. Be faster (see #1). By definition, if you don’t own your own network, you can’t react quickly.
  6. Partner with people, make your platform open so applications can use your core strengths and work with you, as opposed to working against you (hello VZ Wireless, how is that GPS lock going?)

There are probably more to be added, but right now, this is about the limit of what I believe is achievable.