Monday, December 15, 2014

ACI: Two basic questions.

Posted:  Dec 15th, 2014 Authors: Chad Hintz and Cesar Obediente                                                 


In the past year we have done several ACI presentations to different customers in different segments.  But lately, the main two questions that consistently arise from customers are:

·      How “Open” is ACI?
·      How much does ACI cost?

We thought it was appropriate to address these two issues on this blog to serve as a reference for our customers and partners.

In addressing the “How Open is ACI” question, we first need to understand that ACI is designed as a "system" and must be understood as a complete solution that provides the ability for customers to define their Application Network Policy (APN) in order to speed the deployment of applications into the network while helping customers automate their network, something that has been missing in the network.

If we look a little closer regarding how ACI addresses the openness question, we are going to break into different sessions:


  • Northbound
    • ACI provides a full range of RESTful API in order for 3rd party applications like Splunk, Tivoli, etc and Cloud Management Systems (CMS) like Openstack, VCAC and Cloudstack to be able to collect information and configure the ACI Controller (APIC).  APIC supports two data formats such as JSON and XML.
    • Customer and 3rd party partners are encouraged to visit http://developer.cisco.com/site/apic-dc
  • Connecting to the Fabric
    • ACI allows for any 3rd party switch, router to connect to the ACI fabric via standard routing protocols such as OSPF and BGP.
    • ACI provides the flexibility for customers to attach to the fabric via any hypervisor such as ESXi, Hyper-V, KVM, while adding the flexibility for customers to connect containers.
    • ACI allows customers to connect any workload to the fabric via bare-metal or virtual machines.
    • ACI allows customers to choose the best of breed when it comes L4-L7 services, where customers can make the decision to connect any vendor appliance as a physical or virtual appliance via the device package option. 
      • http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/solution-overview-c22-732445.html
    • Cisco ACI allows any type of IP storage to be connected to the fabric.
  • Inside the Fabric
    • Inside the fabric ACI leverages standard protocols like VxLAN, ISIS, BGP and OPFLEX
    • At the moment the N9K is the only platform in the industry that is capable of being part of the ACI fabric. The main reason is because the N9K has some unique capabilities in their ASIC to provide specific functions within the fabric.  We have covered some of these functionalities in previous blogs.


Figure 1 shows the openness of ACI and how ACI addresses the customer’s concerns with regards to how open ACI is.

Figure 1






The second topic we would like to address in this post is: how much does ACI cost?  During our presentations to our customers, we have explained that they need to think about two different “costs”:


  • The Underlay Cost: this is the cost of the physical infrastructure such as switches, optics and cable
  • The Overlay Cost: this is the cost of providing a virtual overlay

To better understand this concept, we are going to taking into consideration a design:
  • 4 Way Spine
  • 100 ToR.
  • Oversubscription ratio of  2:1
  • Each ToR has 32 attached servers with an average of 30 VMs per port


This design is very common for medium to large customers, which equates to about 96,000 VMs per fabric.  Below we are going to compare how much it costs to build an ACI fabric vs building merchant silicon fabric as the underlay and an overlay technology such as NSX or Nuage.  Disclaimer: the pricing we are going to be using for this comparison is “List Pricing”.

Cisco ACI Configuration









Merchant Silicon

NOTE: We will be using Arista as the Underlay




Now we need to add the cost of the overlay to the Arista underlay.  Unfortunately, we don’t have a precise list price for the overlay since we have heard ranges from $1-$50 per VM per month.  We also have heard that some vendors are including this overlay cost in their Enterprise License Agreement (ELA).  

Therefore, if we take the low end of the spectrum for this exercise which is $1 per VM per Month, and follow the same model as above with 96,000 VMs per fabric,  this would equate to $96,000 per month or $1,152,000 per year or $5,760,000 over a 5-year period.  

We also need to include the cost of the servers that are required in order to run the overlay (Controllers, Gateways, etc). Unfortunately, we don't have an actual model on how many servers are require to run this type of solution but we have been told a good rule of thumb is to have about 5% of compute in the overall solution. In this particular case we need to add 80 servers to satisfied this solution. Because of the many variables we decided to leave it off. But customers should take into consideration this extra cost.  

Adding the Arista underlay with the overlay cost, the total cost of the solution over a 5-year period is going to be $12,602,760.

As you can see from this analysis. the gap between Cisco ACI vs Merchant Silicon (Arista) + Overlay is $7,355,023.  That’s a pretty significant figure for any customer that is considering building a private cloud.

Summary Table

Below are couple samples config where it show medium to large fabric and another sample for small fabric.  These comparisons are showing the true saving of a convergence overlay/underlay vs a desegregated solution.



Closing


In closing, having an integrated solution in your network which includes the Overlay and Underlay is not only beneficial from a technical point that we have explained in previous post but it also makes sense from a financial point of view.

Bonus Material

Here is our interview with Soni Jiandani.  Soni is the Senior Vice President of the Insieme Business Group in Cisco.  She well recognized with over 20 years of experience in the switching industry and has been part of numbers of project in Cisco including the Catalyst 5000, Catalyst 6000, Andiamo, Nuova and most recently part of founding team of Insieme.





Friday, November 14, 2014

ACI Specialist Certifications Now Available!

We want to let the everyone that Cisco has announced two additions to their certification portfolio:

  • Designing with Cisco Network Programmability for ACI - 600-511 NPDESACI
    • This exam tests a candidate's ability to use network applications expertise to translate customer requirements into a policy-based, application centric network infrastructure.
  • Implementing with Cisco Network Programmability for ACI
    • This exam tests the ability of network engineers to deploy highly automated network architectures leveraging policy based controller integrated into the infrastructure. Successful candidates will demonstrate that they can deploy, install, and troubleshoot network infrastructures and applications.

These two new certifications are the first series of exams available for ACI.  We have mentioned in the past how ACI is going to change how Data Centers are going to be deployed in the future and these certifications are going to be very valuable for every individual to increase their knowledge about ACI. 


Please reach out to us if you have any questions and Good luck in your exam!!!



Monday, October 20, 2014

Fabric Innovations in ACI

In this blog we are going to focus on how ACI utilizes unique fabric characteristics to provide dynamic load balancing, mice and elephant flow detection along with a systems based approach to networking. We are not going to go into great details from a written blog perspective, instead we invite Dr. Mohammad Alizadeh back for a video to discuss a paper he wrote and received best paper at SIGCOMM 2014"CONGA: Distributed Congestion-Aware Load Balancing for Datacenters" along with how and why this is implemented with ACI!





To download this paper please use this link: 
Sigcomm14 CONGA Paper
To download the slides please use this link:
Sigcomm14 CONGA Slides


Thursday, July 31, 2014

Next Generation Data Center - Spine Leaf Fabric Design

Posted: July 31st, 2014
Authors: Chad Hintz and Cesar Obediente

Next Generation Data Center - Spine Leaf Fabric Design

In this blog we are going to focus on how new data centers are getting built and the benefits of the newer platforms that are in the market today from a Modular Chassis and Top of the Rack.  The goal will be to give you a clear understanding of the new architectures in the Data Center and its benefits.  As we are accustomed to doing, we have invited a very special guest for a quick interview at the end of this blog.

Before we start explaining the new ways to design a new Data Center, lets take a step back and understand how Data Centers have been built in the past and the reason why.   

In the past the majority of networks we’ve built are what we call a 3-Tier model, which is represented in the following topology






The idea behind this topology is that almost the entire traffic was North-South traffic, meaning the traffic that was destined to the Data Center was also leaving the Data Center.  That’s the reason we were building a 3 Tier topology with Core, Distribution and the Access Layer.  Then if we have to insert Network Services, those services such as Load-Balancer, Firewall, etc would be attached to the aggregation layer.  As noted with this architecture, this was excellent for North-South traffic.  The problem happens when the new set of applications are created that require communication between PODs, or what we are calling East-West traffic or server to server communication, requires a different type of architecture to be designed.

It is estimated in that today’s Data Center that 76% of the traffic stays within the Data Center, 17% of the traffic leaves the Data Center and 7% of the traffic is between the Data Center.  Now the question becomes what kind of topology is the most desired to address today’s Data Center requirements.   Because of the server-to-server communication, we had to find an architecture that has the following characteristics:


  • ·      Equal amount of hop count between any two devices
  • ·      Consistent Latency between any two devices
The best way to accomplish these requirements is by building what is called a CLOS Network or also referred to Spine/Leaf network.  Charles Clos invented the CLOS network in 1952 where a CLOS network has three stages: the ingress, middles state and the egress stage.  All of these stages are connected via a crossbar.   In today’s Spine/Leaf network every Spine connects to every Leaf but the Spine doesn’t connect to each other. See the diagram below.











Now that we understand the concept of why we are migrating from a traditional 3 tier architecture to a Spine/Leaf architecture, it is important to understand that the best way to build this architecture is to choose the best possible set of hardware components and the right amount of bandwidth/oversubscription ratio.

Leaf Layer

Let us begin by analyzing the Leaf Layer, as this is probably the most important layer while deciding how to build your fabric because this is the layer where the servers are going to be connected - especially since in this layer is where the “incast situation” is going to occur. Before we keep analyzing the Leaf Layer, let us understand what is “incast”. Incast is where many devices are communicating to one device.  You have a network with 10 nodes, and 9 of those nodes are talking to 1 node.  You may ask, what kind of application is designed that way?  In reality there are several applications that behave that way, for example Hadoop, MapReduce, multicast application, to name a few. Because of this behavior, the Leaf layer needs to have specific characteristics in order to handle this “incast” situation.  One of the requirements that needs to be realized is how much buffer does a Leaf switch provide because of this “incast” problem, but also in this layer is where the speed mismatch occurs between host ports and uplink ports. Servers would be connected at either 1GbE or 10GbE, but the uplink ports are going to be 40GbE. 

If we compare your typical Leaf switch, they are made from what is called “Merchant Silicon”, “Custom Silicon” and the latest category “Merchant+”.

Here is a table with the main difference between the different types of ASICs:


Merchant
Custom
Merchant+
Companies using
Cisco, JNPR, HP, Arista
Cisco
Cisco
Buffer
Trident+ 9MB
Trident2 12MB
Alta           9.5 MB

Depends on the ASIC
52MB
VxLAN Routing
Alta Yes
Trident family no
Depends on the ASIC
Yes


Spine Layer

Then we take a closer look at the Spine Layer; this is where the Leaves would connect.  Depending on the size of your Data Center fabric you could chose to build this layer with a Modular Chassis or a fix switch.  The placement of this layer in your Data Center is very important because you want to make sure that it is centrally located in your Data Center in order for the leaves to have about the same distance.

In contrast between the Spine Layer and the Leaf Layer, is that in this layer you require “enough” buffer in order to sustain a small amount of burst in the network.  This is because in this layer, every link is the same i.e. there is no speed mismatch.


Oversubscription Ratio

Finally we are going to take a closer look at the two most common questions we get asked, “Should I use 40GbE or 10GbE as my uplink ports and how much oversubscription should I have in my Fabric?” As you can imagine, every networking answer has its “It depends” answer J. Lets start by answering the first question - should we use 10GbE or 40GbE?  I think with cost point of today’s 40GbE optics, there is no doubt we should be building our Fabric link with 40GbE. Another reason we recommend the use of 40GbE as the uplinks is because of the “speed-up” effect at the uplink.  Historically there has always been a speed-up between the server connection and the uplink; servers connected 1GbE uplinks as 10GbE.  The main reason for this speed-up is to avoid collision at the uplink in case multiple servers are sending sustain amount of data.

The second question with regards to the oversubscription is going to depend on the number of servers you attach to the leaf, but more importantly the type of leaf you decide to purchase.  For example your “typical” leaf today made out of Broadcom Trident+ ASIC contains 48 x 1/10GbE x 48 plus 40GbE x 4, then you move to the newer Trident 2 family switches where there are different form factors from 96 x 1/10GbE plus 8 x 40GbE and 48 x 1/10GbE plus 6 x 40GbE, and finally you have the Merchant+ from Cisco where you have 1/10GbE x 48 plus 12 x 40GbE to name a few.

Here is the formula on how to calculate the Oversubscription Ratio:
Oversubscription Ratio =  (Host ports * bandwidth) / (Uplink ports * bandwidth)

Different scenarios:

Trident+
48 servers connected at 10GbE with 4 uplinks at 40GbE. You would have a 3:1 oversubscription ratio.

Trident 2
 48 servers connected at 10GbE with 6 uplinks at 40GbE .You would have a 2:1 oversubscription ratio.

Merchant+
48 servers connected at 10GbE with 12 uplinks at 40GbE. You would have a 1:1 oversubscription ratio.

As you can see the oversubscription is variable and depends on a couple of variables:

·      Application resiliency
·      Overall Budget

Once you have decided the right oversubscription ratio, the next question we need to address is how wide the Spine is going to be, is it going to be 4, 6, 8 or 12 Spines?  In order to answer this question we need to look at different factors:

1      1) One uplink per Spine
        2) Multiple uplink per Spine

Our recommendation is to map one uplink per Spine.  This means that if you were using a traditional Trident+ box, which has four uplinks, you would have four Spine boxes in your fabric.  Each uplink from the Leaf would connect to each Spine.

Closing

This post has introduced several key components on how to build a next generation data center, from the evolution of a 3 Tier Data Center to a Spine/Leaf architecture, and the different components in this architecture.  

Bonus Material

Here is our interview with Dr. Mohammad Alizadeh. Dr. Alizadeh works for Cisco in the office of the CTO with the INSBU.  He has a PhD. From Stanford University and he has concentrated his area of research in Data Center Congestion Control.  Some of his works include Data Center TCP (TCP) congestion control algorithm, which has been implemented into the Windows Server 2012 operating system.


Dr. Alizadeh is going to cover his latest research on Data Center Congestion Control and his finding.