Subscribe

Lots of buzz in the air about Cisco entering the server market with the Unified Computing System – using the words “Revolutionary”, “Breakthrough”, etc.   Now, to have Cisco enter the server market is certainly revolutionary… but neither their technology nor their business value is particulary new. But their marketing sure turned-up the volume on it all.

Cisco has sat by the sidelines for a number of years, and watched Egenera in particular — as well as IBM, HP and even a few smaller players – enter the market with ‘converged’ networking and repurposeable servers.  As long as 7 years ago, Egenera was selling our BladeFrame, which is essentially what UCS is.   Stateless x86 servers, converged network backplane. Low component counts, low complexity. Extraordinarily low TCO compared with tradtional servers, I/O and networking.

7 years of maturity in the making

So what does 7 years (and hundreds of customers w/thousands of  installs) get you? Well, a few things. First, on the Hardware side, Egenera knows how to move to standard, lower-cost, volume hardware, demonstrated by teaming with Dell. And, Egenera knows how to make the entire *converged* system work over standard Ethernet – to keep costs, simplicity and risk to minimum.

On the Software side, Egenera has also figured out how to make the “special sauce” (our PAN Manager) run on other hardware too.  That means that all of the experience we have with running BladeFrames now comes as a mature 7th-generation software package.  That’s a lot of development.

So big deal. What does that mean?  Well, if you  look at Cisco’s solution (or HP’s for that matter), you find a very rich set of controls for the converged network, I/O and servers. Yawn.  (Egenera did all that years ago). Cisco then partnered with VMware for virtualization, and BMC for higher-level management like SW provisioning and failover. Good for them.  In the mean time, Egenera’s software had already embedded all of these functions within our “single pane of glass” GUI. That’s complete integration of a VM environment as well as HA and DR/COOP functionality – rather than working across multiple vendor GUIs.

What else you should know about Egenera

With this level of product maturity comes a few more things you should think about:

  • Mission-critical production references – Yep, we got ‘em. In just about every market, in just about every geography.
  • Refined, simple-to-use operation—the GUI is simple to use – and environments  can be set up in 6 easy steps.
  • TCO studies, uptime statistics – Got those too. In fact, we can show you statistics from some of the largest users & hosting companies that show “five 9’s” of availability month-to-month.
  • HW and SW certifications – Oh yeah… and by being around for so long, we’ve certified tons of applications and O/Ss on our system, so you don’t have to worry
  • Integratability – And finally, we “play nice” in your sandbox. With a full Web Services interface (say you want to build a self-service xSP portal) as well as a monitoring API so you can use us with your favorite accounting/chargeback system.

And finally, let’s talk about Value. A bunch is coming out about Cisco’s Pricing.  Then there’s the analysis.  But either way, you’ll find that from a price/performance perspective, the joint Dell/Egenera solution stacks-up against the Cisco approach.

Ask us and you’ll see.

Well, the June 12th Channel Register report on Cisco UCS prices certainly changes the conclusions of the original Cisco price comparison done in April.

For Blades, assuming the “Legacy” Blade Price used by Cisco in the original comparison is the actual List Price of a “Legacy” Nehalem Blade (Cisco agreed that HP was the “Legacy” system, so for this analysis I use an HP BL 460c G6), I was able to configure a “Legacy” Blade and come up with a list price close to the price in the original Cisco comparison. The configuration is two Intel Nehalem 2.53Ghz processors with 32Gb of “slow” memory and a 73GB SAS 15k Hard Disk.

Then using the prices in the Register article, I configured a Cisco Blade with equivalent configuration. In the comparison of Blade prices, Cisco originally used a price of $5665 per blade compared against the “Legacy” Blade at $5732. BUT, using that configuration with the Register-reported prices, the comparison is $9497 for Cisco and $5732 for the “Legacy” Blade.

Another takeaway from the Register-reported prices is the Cisco Interconnect Extender. The Register quotes $3749 EACH – not for two as quoted by Cisco. These two items – though mainly the Blade prices – changes the results of the 8-Blade comparisons with the Cisco UCS being 48% MORE expensive rather than the reported 13% lower cost.

These items roll into the 320-Blade comparison as well – changing the results for that comparison, which now shows the Cisco UCS solution being 13% MORE expensive rather than the estimated 31% cheaper as quoted in the original comparison. Now, I’ve seen some responses claiming discounts will affect “actual” prices, but that’s not relevant to the point of an “apples to apples” comparison.

With the entry of new products to the market such as Cisco’s UCS and HP’s Matrix Operating Environment - a new name for HP’s collection of tools - I thought it would be worthwhile to re-visit the architectures for Real Time Infrastructure and discuss the different approaches and what the strengths/weaknesses are of each. Specifically, I’ll discuss the different fabric architectures used in these products, as that is one of the key differentiators between the various offerings.

The three types of fabrics I’ll discuss here are Converged Fabrics, Dynamic Fabrics, and Managed Fabrics.

Converged Fabrics

I’ve blogged about Converged Fabrics in the past, though when I looked at the date I found out that it was 2 1/2 years ago! Time flies when you’re having fun, and certainly the market has changed quite a bit since then.

A converged fabric architecture takes a single type of fabric (e.g. Ethernet) and converges various protocols on it in a shared fashion. For example, Cisco’s UCS converges IP and Fiber Channel (FC) packets on the same Ethernet fabric. Egenera’s fabric does the same thing on both Ethernet fabrics (with our Dell PAN System solution) and on an ATM fabric (on our BladeFrame solution).

At the endpoint of this fabric is an IO bridge, which is the convergence and conversion point. This is where the protocol gets converted to it’s native form. For example, when FC traffic is sent over an Ethernet converged fabric, the Ethernet packet carrying the FC data terminates at the IO Bridge. That Bridge then creates a new transmission in native Fiber channel and sends the data over the FC fabric.

The benefit of this approach is that a single fabric can be used to carry multiple types of traffic. This reduces cables, complexity, and cost. The downside is that packets get terminated and re-started at an IO bridge, which can add latency. However, the latency in the IO Bridge is typically much smaller than the latency of the external devices so, in reality, it’s not a major issue.

Vendors that use this fabric architecture include Egenera, Cisco, and the Infiniband vendors such as Xsigo and Voltaire. There are also a host of switch vendors who will take advantage of the new IEEE Ethernet standards to deliver switches that can support this type of architecture. While Cisco’s fabric is new to the market, Egenera has been selling this solution for 7+ years and has hundreds of customers and over 1400 installations. This architecture has been proven in the market.

Dynamic Fabrics

Dynamic Fabrics are not converged, but rather separate fabrics that can be have their configuration modified dynamically.  This is the approach that HP uses. Rather than utilize a converged fabric, HP has separate fabrics for FC and Ethernet. These fabrics can be dynamically re-configured to account for server fail-over and migration. HP’s VirtualConnect and Flex10 products are separate switches for Fiber Channel and Ethernet traffic, respectively.  When a server is moved (maintenance, fail-over, etc.) the switch is re-programmed so that the communication paths are re-wired to match where the server has been moved. For example, HP’s VirtualConnect leverages NPIV technology, which allows a WorldWideName to be re-deployed from one switch port to another, thus allowing a server to be moved and still have access to its LUNs. This removes the need to re-zone the SAN when a server is moved. It’s useful technology.

The benefits of this type of fabric is that it reduces port count (NPIV ports can be shared) and complexity. However, it doesn’t reduce the cost. The servers need to have both Ethernet and Fiber Channel cards and there are multiple fabrics which are fairly expensive.

Vendors that use this type of technology include HP and Fujitsu.

Managed Fabrics

The 3rd type of fabric is a Managed Fabric. In this architecture there is no convergence at all. Rather, the vendor programs the Ethernet and Fiber Channel switches to allow servers to migrate. This is a bit like the Dynamic Fabric above, however, these typically are not captive switches and there is no convergence whatsoever.

The advantage of this approach is that it allows the customer to use their existing switches. The downside to this approach is that the management software is literally re-programming the premise switches (Ethernet and SAN) on the fly, which is typically not allowed in a well-run data center. This can lead to security issues, maintenance issues, compliance issues, and data center policy issues. This approach is better suited for test/dev environments where switch re-programming is not as big of an issue (though still an issue I maintain).

Vendors that use this approach include Scalent.

Summary

So, that was a quick trip through the various fabric architectures for a real time infrastructure. Each has its plusses and minuses. I believe the Converged Fabric approach is the correct architecture (big surprise there!) and also the future of where the data center is headed. It provides so many customer advantages, as customers can use an interconnect they are already familiar with (Ethernet), with a large ecosystem and a strong roadmap. It also is the best fabric architecture for lowering costs and complexity. As the IEEE standard gets finalized and the price for 10G Ethernet and Converged Network Adapters come down, I predict that that data center will finally see wide spread adoption of the Converged Fabric.

Seems like virtual switches are now the new “it” thing. Cisco announced one that plugs in to VMWare environments. Now comes news from Citrix Syngergy that they are also developing a virtual switch for Xen and KVM. Why all the buzz over virtual switching?

There are 2 good reasons for virtual switches  - one is to make it easy to deploy and migrate virtual servers. The other is to allow network administrators access to managing virtual switches deployed by hypervisors. The former is a technical solution to help virtual machines become more flexible; the latter is a solution to an operational issue, where network administrators want control over every switch that is deployed in the data center.

There is also another benefit of vswitches. It’s easier to roll out new functionality in a virtual switch than it is to change a physical switch, even if it’s just software in either case. Customers are less willing to upgrade their hardware/firmware on their premise networks compared to a virtual switch in a virtual environment. As a result, you may see quicker deployment of features like QoS, enhanced security, and more granular network statistics that not only monitor the physical hardware, but also the virtual networks.

Be careful when looking at these virtual switches, though. A close inspection might reveal some unwanted performance side effects. For example, rather than allow two virtual machines connected on the same vswitch to communicate directly, Cisco’s vswitch forces the packets to flow out of one virtual machine, to a physical switch, and then back to the destination virtual machine. That seems kind of inefficient to me.

Egenera has been providing virtual switching for physical and virtual servers for many years. Because we were the first unified fabric computing system on the market, we had to face these same technology challenges on the way. Virtual switching is a natural evolution allowing server personalities to migrate to different physical hardware while at the same time preserving the network configuration and topology.

The key to virtual switching is that it must fit in with the current operational environment and existing switching environment in your data center. If not, you may be getting more of a headache than a solution.

Back on 16 April, Cisco began its marketing push on its Unified Computing System (UCS) hyping the fundamental cost savings it provides – as much as 31% over traditional approaches for a large scale-out implementation.

But after close scrutiny, Cisco either needs to check their math or adjust their pricing. Their claims about TCO, bandwidth etc. are overstated, creating  erroneous conclusions on bottom-line savings.

Unfortunately, the media probably accepted this analysis without taking a close look. BTW, we’re referring to slides they used during a webinar, available at http://blogs.zdnet.com/BTL/?p=16477.

Setup: Analysis of Cisco’s Slide “Sample Configuration - 8 blades”

Management is shown as $7,000 for 8 Blades. This is either $7,000 per chassis or $875 per blade. An interesting coincidence that HP’s Virtual Center Enterprise Management software has a list price of $7,000 per HP c7000 Bladesystem chassis. Another coincidence – for the 8 Blade configuration, Cisco lists “adaptors” for the blades costing $5,992 – that’s $749 each. The 4Gb/s Fiber Channel SAN mezzanine adaptors for the HP Bladesystem cost $749 each on the HP website. And there’s more – the 10Gb Ethernet switches are listed at $24,398 by Cisco – that’s $12,199 each. The HP FLEX10 Virtual Connect Ethernet switch costs $12,199. OK, last coincidence – Cisco lists Fiber Channel switches at $18,998 = $9,499 each. Yes, HP’s website lists 4Gb/s Virtual Connect Fiber Channel switches for $9,499. So, I think we can conclude the “Legacy System” Cisco is comparing to is an HP cClass Bladesystem. Knowing all of this, we can now analyze the 320 blade configuration to reconcile the costs.

Analysis of Cisco’s Slide “Sample Configuration - 320 blades”

On this slide, management Software is listed at $554,400. Since we know this is an HP c7000 Blade System which holds 16 Blades, there are 20 chassis – each with 16 Blades. The calculations work out correctly - $8,713 from the 8 Blade configuration times 20 = $174,260.  If Management software is a chassis-based price ( $7,000 from above) - with 20 chassis this total should be $140,000 – however Cisco shows $554,400 – a $414,000 mistake. Cisco’s $554,000 comes out to $27,700 per Blade chassis, or $1,732 per Blade. No that’s a pretty big mistake.

Bottom-Line: So, if we adjust for this mistake, the difference between Cisco’s UCS total and the “Legacy” total (for 320 blades) is really only 14% - not 31%.

Cisco’s error comparing “apples-to-apples”:

Also, the Cisco configuration for 320 blades includes 40 Extenders. We know this because if you divide the 8 Blade “Extender” cost into the 320 Blade configuration extender cost – it equals 40. We also know that each Cisco Blade chassis holds 8 blades, so 40 chassis’ are needed to support 320 blades. SO, there is ONE Extender in each Cisco Blade chassis in this cost comparison. That’s ONE extender for each of the 40 blade chassis. That means that the extender is a single point of failure for each blade chassis in this configuration. In contrast, the (HP) “legacy system” is priced with full redundancy.

Bottom-Line: Cisco is not providing a fair comparison. Advantage HP.

A bit of hype regarding network bandwidth:

The 40 Fabric Extenders mentioned above means there is a maximum throughput of 40Gb/s for each chassis with 8 Blades.  The Extenders have only four 10Gb/s FCoE ports for attachment to the fabric switch. So, that amortizes to 5Gb/s for each Blade – “IF” the ports on the Cisco Blades can be teamed or configured as Active/Active – if not, then this is a fabric throughput of 2.5Gb/s per Cisco Blade. So, this Cisco UCS configuration is not a 10Gb/s fabric – it is either 2.5Gb/s or 5Gb/s fabric best case. We can answer this question too. There is another factor at play. EACH Fabric Interconnect (aka Nuovo Switch) supports 40 Fabric connections to the Cisco Blade chassis. In order to support 320 Blades, each Cisco Blade chassis can only have ONE 10Gb/s connection to EACH of the redundant Cisco 61000 Fabric Interconnect switches. So, this results in EACH Cisco Blade chassis having a maximum of TWO(redundant) 10Gb/s fabric connections shared by 8 Blades. So this calculates to 20Gb/s for 8 Blades – or 2.5Gb/s per Blade. This, this tells us that the Cisco 320 Blade solution has a 2.5Gb/s fabric capacity. Now, for the “Legacy” (HP), recall the blades have redundant (2x) 10Gb/s Ethernet ports and redundant (2) 4Gb/s Fiber Channel Ports. The redundant (2) 10Gb/s Ethernet Switches in the “Legacy” system have a total of 16×10Gb/s uplink ports – so there is enough fabric and switch bandwidth for a true 10Gb/s Ethernet throughput for each Blade. The redundant (2) Fiberchannel Switches have 8x 4Gb/s fiber channel ports. Which is an additional 0.5Gb/s of Fiber channel through put per Blade in the Legacy system.

Bottom line: This is an unfair cost comparison of a 10Gb/s “Legacy” (HP) system to a 2.5Gb/s Cisco UCS system.

High bandwidth limited to low blade count

Cisco’s 320-Blade configuration shows us that the “Fabric Interconnect” to support it actually costs $138,182. Cisco recommends two Fabric Interconnect units for the Cisco 5020 products, assuming this is the same case here. So, each of these Cisco 6100 Fabric Interconnect units costs $69,091.  These are the 40-Port switches which are needed to support the 320-Blade configuration. If a customer wants a full 10Gb/s fabric, each Blade Chassis will need two Fabric Extenders ($3,998 each) AND must use all 8×10Gb/s uplinks on the Fabric extenders. This also means the 40 Port Fabric Interconnect switch will only support –Five– blade chassis’ – not 40. So achieving a 10Gb/s fabric version of the Cisco UCS is limited to 40 Blades maximum.

Bottom line: If a customer is interested in full bandwidth 10Gb/s connections to each of the Blades, the entry cost is $138,182 for the networking only, and only for 40 slots. That’s $3,304 per Blade slot for infrastructure. And then the blades themselves are additional costs!



Gartner’s recent article Oracle RAC Moved to Mainstream Use made me think about how important Oracle’s work has been around RAC — but also how its progress could have been even faster if not for stubborn complexity of the underlying physical infrastructure.

Let me explain with an example. Oracle Grid, aka Oracle RAC, allows database administrators to string multiple (I have heard up to 50) x86 white boxes or blades together to create one large Oracle Engine for mission-critical databases. If you lose one blade, the engine is still running while you replace it. If you run out of capacity in the engine, just add another blade and scale or shrink accordingly.

But physical infrastructure complexity hurts the stability, scalability and flexibility of larger RAC environments. For example, to set up a 50 node RAC, an IT organization must string an estimated 200 to 300 Network and SAN cables together…the very definition of hairball! I believe RAC adoption for databases has been slowed because of this remaining complexity.

As you can see, the server platform holding the RAC node really DOES matter. Eliminate the “hairball” and you can get on with achieving GRID for even the largest installations.

Ok. . .so there’s the catch. Most hardware vendors that offer commodity boxes or blades don’t solve the physical complexity problem. Some of the larger SMP machine vendors have solved this, but large SMP machines don’t make sense for RAC. It doesn’t make sense to set up an 8-node RAC with 8 Superdomes. Imagine needing to add a ninth node: ”Boss, I need to buy another superdome . . .can you help?”

Now there are a handful of vendors (shameless plug for the Egenera-Dell partnership) that have solutions to counter the physical complexity problem by consolidating IO and leveraging SAN and NAS infrastructure to the fullest. We use blades as do HP and IBM. But, Egenera and Dell PAN System blades are very different.

For starters, they’re stateless with most of the physical complexity associated with standard blades replaced by IO and network virtualization software. Stateless blades can be virtually strung together and added to the cluster at the click of a button. Stateless blades means that instead of 300 cables for a 50 node RAC, you end up with 50 cables, a much more manageable issue.

After removing the physical complexity of an Oracle Grid by replacing the physical mess with software, our clients have successfully set up double-digit RAC Node environments and maximized their return on investment.

Do you agree? Weigh-in with your thoughts.

Check out the post fellow blogger and colleague Ken Oestreich wrote recently. It’s a great description of the Processing Area Network. Sometimes a picture/video is better than text and Ken does a great job illustrating the architecture and value of PAN.

Given all the buzz around Cisco UCS, Ken’s blog is timely and informative. Way to go Ken!

Yogi Berra said it, Cisco just reconfirmed it.

Reviewing Cisco’s much-leaked, and much hyped announcement of Project California was like a trip down memory lane.

Blade Form Factor - Check!

Unified Fabric for IP and Storage traffic - Check!

Management Console - Check!

Integration with Virtual Machine Management - Check!

Sounds oh so PAN like to me. If imitation is indeed the sincerest form of flattery, then we at Egenera feel very flattered.

OK, so I’m being a bit facetious today, but can you blame me? Cisco is bringing to market the very architecture that Egenera created over 8 years ago. In some regards, it’s great to see Cisco validate the PAN architecture and put their muscle behind pushing this in to the enterprise. It is the right architecture and the right way to solve the complexity problem that been plaguing the data center since the migration away from mainframes.

We at Egenera have been pushing this message for years and if you’ve read this blog, you know that it’s been a core part of my message since I started writing a couple of years ago. Complexity is the biggest issue facing data centers and technologies like converged fabrics or methodologies like cloud computing were invented to solve complexity.

Cisco bringing a PAN-like product to market will no doubt help reduce complexity in the data center. However, this is only part of the solution. It will be interesting to see if Cisco can also be a trusted server partner and help customers through their day-to-day issues that span servers, networks, storage, application, HA, and management. If so, then this will truly be a transformational product for Cisco.

But, it will take time for Cisco to hone these skills and for their product to mature to the point where it makes an impact. If you don’t have the time (who does?), you should check out the original Processing Area Network product at Egenera. Talk to our 1400+ installations and see how they’ve already reduced their data center complexity.

It appears the Egenera/Dell deal is moving forward with velocity. A first joint customer was announced today, the US Department of Veterans Affairs (VA) Corporate Data Center Operations (CDCO).

This is the first of a pipeline of customers buying-into Infrastructure Orchestration (also referred to as Fabric Computing or Unified Computing) - first offered in 2001 by Egenera with their high-end BladeFrame + PAN Manager software, and now being mainstreamed by combining PAN Manager with Dell hardware.

The VA’s CDCO is really a hosting facility - much the way an xSP hosts applications for third-parties. In this case, they’re hosting a mission-critical application for an influenza early-warning system. I’ve met with their CTO, who’s a pretty forward-thinking guy. He recognizes the fact that his “customers” frequently change requirements, and computing demands frequently change too. So, for an environment comprising physical databases and virtualized instances, the Egenera/Dell system provided significant agility (ability to re-provision quickly) while maintaining a mission-critical level of availability.

The Dell/Egenera deal has been getting some profile lately - as it takes on similar technologies such as IBM’s Open Frabric Manager, and HP’s Insight Orchestration products. Stay tuned for some more juicy news.

Last year, Egenera announced support for Dell rack mount servers, extending the Processing Area Network architecture to off-the-shelf, industry standard servers. This combination joins the best-in-class Unified Fabric architecture (PAN) with the best-in-class industry standard hardware from Dell. The solution earned early praise from customers and has proven to be a terrific offering for the market.

Now, Egenera is supporting Dell blade systems. This capability allows the PAN to be extended to blade servers from Dell and is the first and only solution on the market that spans rack and blade servers, while at the same time, leveraging only commodity components. No special hardware, no special switches. Everthing is COTS based.

PAN on Dell blades leverages Ethernet as the fabric and standard servers as the processing nodes, providing a cost effective yet highly dynamic environment for the data center.

Our ability to quickly support different classes of servers is a testatment to the PAN architecture. PAN is now running on Egenera BladeFrame EX and ES systems, Dell rack mount, Dell blades, and Fujitsu-Siemens BX600 blade systems. It’s a truely heterogeneous platform that allows the user to choose the best hardware option for their data center, but still enjoy the power of PAN, regardless of that choice.

That’s pretty powerful, and in my mind, exactly where the data center is heading.

Older Posts »