Goodbye Dual Controller Arrays. Hyperconvergence meets All Flash.

Say goodbye to your dual controller storage arrays.

Flash has made high performing clustered storage systems a reality.  With Nutanix’s NX-9000 series announcement, webscale and hyperconverged architectures now have the capability to handle the most demanding workloads with large working sets.

Hyperconvergence.  Webscale.  Meet All Flash.

This announcement is a huge step forward for the storage industry.  Dual controller and centralized storage architectures were popularized in the mid 90’s because they allowed you to share files, provide higher levels of availability and protection, and RAID many shelves of spindles behind a storage controller to get higher effective performance. Centralized storage (and dual-controller arrays) also gained popularity in the last decade of virtualization since they were the only way at the time to achieve the High Availaibility and VM mobility features touted by VMware and other leading hypervisors.

Why are we stuck using an architecture that was intended to share files … to now host virtual machine disk files (ie VMDKs, VHDs) that normally belong to a single VM at any given time?  We shouldn’t be.  It’s time for the next major step in virtualization.

The emergence of SSD based drives, advancements in distributed systems, and hyperconvergence has made dual controller and centralized storage architectures obsolete.  A single flash drive can supply magnitudes more random IOPs at lower latencies than any 15k or 10k spinning media.  In fact, dual controller and centralized storage architectures are now limiting the operational improvements, performance, and availability that flash based storage can bring to your enterprise datacenter.

All Flash needs more.

  • Flash needs storage CPUs to drive IOPs.  Dual-controller architectures are limited in CPU.  Need more CPU?  Rip and Replace OR Deploy and Manage yet another array.
  • Flash needs scale-out clustering.  Dedup/compression can be done by many controllers in software on much larger data sets, allowing you to save additional flash capacity.  Most Dual-controller architectures can only dedup what fits behind them.
  • Flash needs RAIN architecture to drive higher rates of utilization.  Dual-controller architectures must be ran at 50% utilization to tolerate failure without performance impact.
  • Flash needs many medium sized storage controllers, rather than dual I/O bottlenecks.  Companies are running more than two virtual machines and can benefit from balancing across many controllers.
  • Flash needs Information Life-cycle Management (ILM).  Advanced flash storage software systems need to balance data between [high perf, high endurance, low capacity] flash drives and [low cost, higher capacity, low endurance] flash drives.  Similar to how hybrid storage systems balance between Flash SSD and spindles today to optimize cost and performance.
  • Flash needs better garbage collection.  The dirty secret of the flash industry… more on this later.
  • Flash needs to sit as close as possible to the VM compute resources that are doing the I/O.  Dual-controller (centralized) architectures require network hops for reads and higher network utilization.
  • Most importantly, Flash needs more than performance.  You can’t drown all the issues and complexities of traditional dual controller architecture in IOPs and low latency.  Flash is only part of the next generation storage solution.



Scale-Out trumps Rip and Replace:

Traditional storage vendors live by a three year rip-and-replace lifecycle.  Storage Controllers need to be swapped out in order to take advantage of advancements in Intel x86 processor capabilities.  With Nutanix’s revolutionary file system, new and old storage controllers can co-exist in the same cluster, allowing you to immediately employ advancements in Intel computing technologies for storage processing.  Dual-controller scale-out is a capacity planning and operations nightmare.  With Nutanix’s scale out clustering, you can increase your all-flash performance without a destructive and risky Rip and Replace cycle.


Scale-Out Storage Processing Power with Your Flash:

Flash requires storage processing power to drive IOPs and performance.  Why put all your flash behind two controllers? Dual-controller architectures cannot sufficiently drive large amounts of flash.  By spreading flash devices across many controllers, you can drive higher aggregate performance.  This performance also increases as you scale out the number of controllers.  All within a single datastore, namespace, and management domain.


Put Flash next to your VMs with Nutanix’s Data-Locality:

Data-locality matters.  The Virtual Machine data-locality built into the Nutanix Distributed File System (NDFS) keeps the majority of write and read storage I/O on local flash.   Read I/O does not need to traverse the network.  This results in an improvement of read latency while also reducing Network bandwidth consumption.


Erasure Coding provides Capacity Savings for RAIN-based architectures too:

One of the biggest criticisms of RAIN-based architecture is the requirement of data replication. Techniques like erasure coding will also allow RAIN-based architectures to become more efficient with capacity (and competitive with RAID capacity savings using parity), removing one of the key weaknesses and major criticisms.  As flash continues to come down in price (inevitably) — efficient scaling, performance, simplicity in operations, and advanced architectures will trump other considerations.


All Flash Needs Better Garbage Collection:

All-flash needs garbage collection. The dirty little secret of the flash industry is what one needs to do in the background. The foreground speeds are sexy and all good with a new array.  What occurs over time is a massive attrition of the flash. Efficiency of the background tasks are the unsexy secret of the flash industry that will expose the true character of all-flash systems. You were impeccable and fast when you were new and had a single workload with little fragmentation. What happens over time is life — a sobering fragmentation that slows you down. Great Flash storage software needs an awesome background framework for garbage collection. Backgrounders need CPU to introspect and retain the character of the system.  Scaling out CPU allows for more efficient background cleanup without impacting performance.  It also allows greater efficiency using techniques like MapReduce to distribute the workload.


RAIN (Redundant Array of Nodes) architectures drive higher availability and utilization than Dual-Controller RAID (Redundant Array of Disks) architectures:

This is a big one.  What happens when you lose one of your dual controllers in a traditional array?  You’ve effectively lost half of your storage processing capability and performance.  In order to ensure there is no major performance or availiability impact during failure, Dual-Controller Arrays should be ran at 50% or less of their processing capacity.  What a huge waste.  With RAIN, you no longer need to minimize your workload to 50% in order to tolerate failures without impact.  In a 10-node cluster for example, you can run each node at effectively 90% of its processing capacity and still the capability to tolerate any node failure without performance impact.  Consumption can be optimized by allowing multiple nodes to assist in the recovery of any failure.  This allow all nodes in the system to reach higher rates of utilization while still maintaining high availability.


Manage Resilient Converged Resource Pools with VM-level policies… not LUNs, FCP, iSCSI, Zoning, Volumes, WWPNs, WWNNs, Multipathing, etc. arrays:

Performance is not the major problem of today’s enterprise storage — its management, scaling, and operations.  Flash can help provide consistent performance, but IOPs are not going to solve your largest pain points.  Storage wars have devolved into a war of performance.  We take a different approach at Nutanix, by combining flash performance with a balanced focus on management and operations.  We keep things simple for the administrator so they can focus on more important aspects of IT and the business.  Try racking, stacking, and deploying 40-nodes of a Nutanix cluster.  It’s as easy as sliced bread (or for that matter, much easier than slicing LUNs and aggregates).



Don’t fall for The Fiddler’s tunes:

One of the more famous villians in the “Flash” series of comics was “The Fiddler”, Isaac Bowin

“In his early years, Isaac Bowin was a petty thief working the streets of India. While Bowin was attempting to rob a local merchant, local authorities found and apprehended him, whereupon he was taken to prison. During his incarceration, Bowin met an old Hindu fakir who used the power of music to hypnotize and control the actions of a deadly cobra. Bowin pressured the fakir into teaching him the mystical powers of the East, and the Hindu finally relented. Bowin proved an apt student, and before long, his knowledge surpassed that of his teacher. Using random materials found in his cell, Isaac fashioned a crude fiddle and used his new knowledge to escape from prison. He no longer required the services of the old fakir, so he used the power of his fiddle to murder him. Afterwards, he tracked down the merchant he attempted to rob earlier and executed him as well. Calling himself the Fiddler, Bowin returned to the United States to begin a new life of crime.”

The Fiddler used his musical powers to hypnotize and enslave the population of Keystone.  What was the Fiddler playing?  The same old tune on the same violin.  (Violin – perhaps there is some irony there?)

A dual controller flash array from EMC, NetApp, or any other vendor isn’t a dramatic change from the status quo.  It has the same problems and architectural limitations that your existing EMC and NetApp arrays have.  It can’t scale-out well.  You need to rip and replace it to upgrade.  You have to run it at 50% performance to be able to tolerate failure without a performance impact.  Same old NAS/SAN, with flash.

Consequently, flash is just one ingredient in next generation storage architecture.  By itself, it doesn’t change the game.  The big storage companies understand this fact well.  NetApp tried to fix their WAFL scaling limitations with clustered OnTAP.  EMC has tried to fix their own platform limitations by buying every company they could afford (and trying to purchase us).  We aren’t for sale.  We will continue to employ the top distributed systems talent from the leading web companies.  We are here to offer you something new.

Hypercongerence.  Webscale.  Meet All Flash.


Leave A Comment