How Nutanix and IBM are Powering the Next Wave of Linux-based Web, Database, and Deep Learning Applications

terminator

Building the next version of Skynet?  Or perhaps some more altruistic implementation of Artificial Intelligence not hell-bent on destroying the human race?  Nutanix and IBM are here to help by making the infrastructure for modern data crunching, deep learning, and Linux-based web applications easier to deploy, manage, and scale.

At Nutanix, two of our core missions include:

  1. Creating software that makes it easier to adopt webscale technologies within enterprise IT environments.
  2. Freeing customers to choose the best platforms, hypervisors, and clouds for their environments with software that lets them seamlessly manage across these infrastructure constructs, as well as migrate between them.

Several years ago Nutanix started out by using distributed systems technologies such as Cassandra and Zookeeper to virtualize storage as a distributed storage fabric, enabling enterprises to move away from centralized SAN/NAS storage arrays.  We expanded by building a robust, secure, and easy-to-manage hypervisor based on KVM through Nutanix Acropolis. Since then, we’ve also broadened the platforms our software can manage and integrate with:

  • Servers from Supermicro, Dell, Lenovo, Cisco, and HP.
  • Hypervisors from VMware, Microsoft, Citrix, and Nutanix.
  • Public cloud integrations with Amazon AWS, Microsoft Azure, Google Cloud Platform, and Nutanix’s Xi Cloud and Calm lifecycle automation software (much more coming in this area soon!).

In this blog I’m excited to talk about our newest server platform, one emerging from our partnership with IBM and based on their Power processor architecture.

 

IBM-and-Nutanix

Unleashing IBM Power

Power-based systems excel at reliability and performance, which is why they remain at the core of many mission-critical enterprise workloads. Power-based systems run some of the largest Oracle Database, Epic, and SAP HANA implementations out there–all mission-critical applications known to be extremely demanding in terms of performance and uptime.  Despite IBM’s success in this arena, it has struggled to displace Intel’s x86 server technology for general virtual computing workloads. A combination of forces slowed enterprise IT adoption of Power-based systems:

 

VMware never embraced non-x86/x64-based server technology.

When the first wave of modern server virtualization happened in the 2000-2010 time frame, VMware became the de-facto standard.  Although virtualization had existed within Linux and mainframe systems in many other fashions, VMware brought it to the next level of simplicity. Many enterprises began virtualization initiatives, intending to move as much of their server and datacenter workloads to VMware virtualization as possible. VMware was so ubiquitous that x86 became synonymous with virtualization. Among the only workloads left behind were databases, ERP, and mission-critical systems, which most customers kept on bare-metal due to the perceived overhead of virtualization at the time. VMware did not support IBM’s Power-based, System i, and mainframe systems, which meant that IBM’s server technologies missed out on this entire shift.

 

Power systems required a specialized skill set to deploy and manage.

Power systems used their own virtualization technology based on PowerVM (and eventually, PowerKVM), which meant that they did not benefit from the VMware ecosystem and could not be managed easily alongside virtualized x86 computing clusters. There was slight progress in this area when VMware vRA added some integration for managing OpenStack/PowerVM Power-based environments in the past few years, but they were still “two different worlds”.

 

Microsoft application and Linux application ecosystem support was lacking.

Microsoft’s server products are still not supported on the Power architecture, and these products are an important part of many companies’ IT infrastructures, including email, directory services, and custom applications based on the .NET framework, ASP.NET, and SQL Server.  Also, in order to run on Power, some Linux-based applications written in compiled languages like C or C++ do need to be compiled for the Power processor (ppc64 or ppc64le).  While this is not a difficult process, some Linux application vendors do not have binaries posted for ppc64 or ppc64le.

 

These forces have had a tremendous impact on x86 adoption.  But when one company holds 95%+ of the server market, competitive forces and innovation are bound to affect the status quo.

 

Simplification of IT infrastructure operations and application portability is leading to a renaissance of platform exploration

Advances in virtualization, hyperconvergence, devops, containterization and other areas have simplified operations and dramatically reduced the amount of time needed to deploy and manage infrastructures. These changes in turn have spurred a renaissance of platform exploration as companies search for greater power and performance efficiencies. We’ve seen this with the emergence of RISC-based ARM, with its focus on power-efficient processors for mobile, laptop, and IoT-enabled devices, as well as with the proliferation of Nvidia’s GPU technology, which can be substantially more efficient than x86 for machine learning algorithms and AI.  In light of these shifts, when Google began porting all of its applications to run on Power 8 and Power 9 systems, we definitely took note.

ibm power9 google

 

All of these transformations made it clear that Nutanix, IBM, and our respective customers had a lot to gain from a joint partnership.

 

Manage your IBM CS821 and CS822 hyperconverged systems alongside your Nutanix x86 environments with the same Nutanix toolset and interface.

Nutanix Acropolis and Prism, arguably the simplest hypervisor and storage management systems available, can now manage IBM’s hyperconverged systems. For customers who have already adopted Nutanix x86 hyperconverged systems, you can even manage an x86 cluster and an IBM Power cluster side-by-side using the exact same interface and Prism Central instance. With Nutanix Acropolis, you no longer need different tools and a specialized skill set to manage and deploy Power systems. A Nutanix and IBM Power solution lets you choose the right processor and server architectures for your applications, without any silos in infrastructure deployment and operations. The ecosystem around Nutanix Acropolis is also growing, with many networking, backup, security, devops, and IT operations vendors announcing support for Acropolis management and APIs through the Nutanix Ready program.

 

Linux ecosystem support has been consistently growing through IBM’s OpenPOWER initiative.   

Linux-based applications have led major industry shifts, such as the last decade of “Big Data.” You could even say that Linux and Open Source led the development of public cloud, hyperconvergence, and virtualization (Amazon AWS, Google Cloud Platform, VMware’s ESXi, and Nutanix Acropolis, Prism, and the Nutanix Data Fabric are all heavily rooted in Linux and Open Source). IBM continues to broaden its  support of Linux applications for the Power platform through the OpenPOWER initiative. Nutanix is also committed to developing and broadening the use cases that will benefit from this new joint platform.

linux ecosystem

More importantly, we see a lot of new investment coming into Deep Learning and AI, which is shaping how these technologies help solve important business problems across many industry verticals. IBM has been at the forefront of developing this technology and applying it to enterprise applications, especially in healthcare, financial services, intelligence, SLED/research, insurance, and retail systems.  They’ve even optimized their platforms for performance on these new applications, such as improving the bandwidth of the CPU — GPU bus with their NVLink technology.

Over the next five years we expect rapid increases in enterprise adoption of Deep Learning frameworks such as Apache MXNet, Tensorflow, and IBM’s own PowerAI, which is based on Caffe, Torch, Theano, and Tensorflow. We are especially excited to partner with IBM in this area to ensure that our customers can empower the next wave of cognitive applications with infrastructure that’s simple to deploy, manage, and scale.

 

IBM-PowerAI

 

Many Linux applications run better on Power.

There are many use cases and workloads that are a great match for this new platform. The following use cases are just a few examples:

Web Applications:  LAMP, Spring/JAVA, IBM Webshere, Django, Ruby on Rails, WordPress, Drupal, and the newest wave of Linux-based web application stacks (MEAN, MERN, Laravel, React.js, Meteor, etc.)

Standard Databases: PostgreSQL, MariaDB, DB2, MySQL

NoSQL Databases: MongoDB, Neo4j, Cassandra, CouchDB, Redis, Hbase

Big Data and Analytics: Hadoop, Hortonworks, Apache Spark, etc.

Cloud/Containers: Nutanix Acropolis, Openstack, Docker

DevOPs: Chef, Puppet

Operational Intelligence/SIEM: ELK stack (Elasticsearch, Logstash, Kibana), Apache Solr

Cognitive and AI: IBM’s PowerAI Framework, Tensorflow, Apache MXNet, Torch/Pytorch, Caffe, DL4J, Theano, Chainer

Custom Vertical Apps: Linux-based apps in the Core Banking, Healthcare, Research/SLED, Federal, Financial Services/Insurance, and Retail sectors

linux on power use cases

What’s coming? – Nutanix and IBM:

We’ve received a ton of customer and channel partner interest in the solution after our product announcements at .NEXT this summer. The product GA’ed in the last week of September and we have been shipping out the first wave of IBM CS821 and IBM CS822 systems this past month.  As customer adoption continues to increase, we will document more detailed reference architectures, best practices, customer case studies, and additional use cases for the platform.

We have a lot more in the works for this partnership, with many announcements to come. In every instance, however, Nutanix will make it easier for enterprises to choose the servers, platforms, and clouds that best suit their needs, and, in the process, free IT to spend less time on operations and more time on innovation. IBM Power and Nutanix: First-rate performance and reliability meets unrivaled simplicity and scale.

 

For more information:

Now is a great time to explore how IBM’s first hyperconverged platform can help you achieve new efficiencies in performance, scalability, and IT operations. To learn more about the solution or if you are interested in a POC, please contact your local IBM and Nutanix sales representatives or partners, or send an email to ibm@nutanix.com.

butterbot

 

Resources for IBM and Nutanix Power Solution:

IBM and Nutanix Press Release – PR Announcement of the joint solution

IBM CS821/CS822 Datasheet– Technical specifications for the IBM and Nutanix Power servers

IBM and Nutanix Interview– A video interview of Nutanix and IBM discussing the joint solution

IBM and Nutanix Announcement Blog – Blog discussing the announcement and general use cases for the solution

Nutanix’s IBM Power Website– Overview, additional resources, and contact information for the Nutanix IBM Power team

 

More Information on Nutanix:

How Nutanix Works– A video overview about the Nutanix Solution.

How Nutanix Works, Nutanix Storage Deep Dive– A video regarding the design premise and benefits of hyperconverged storage

Hyperconverged Infrastructure: The Definitive Guide– An overview of how Nutanix and hyperconvergence have changed the IT landscape

The Nutanix Bible– Deep and comprehensive guide to the internals of the Nutanix technology

Nutanix Training and Certifications– Online training for NPP, NPSR, NPX, NPSS and other Nutanix technical and sales certifications

Nutanix vs. VSAN Price and Performance — Part 4 (or why Chuck and EMC are deathly afraid of us)

1268198584550505870

We have recently suffered through several blog posts from Chuck Hollis, EMC’s chief blogger and strategist on loan to VMware.  Chuck released these blog posts to coincide with (and distract from) our first .NEXT user conference in Miami earlier this month.  We declined to respond to them at the time.  Our company was proud to be launching its first user conference and we wanted the focus during the last few weeks to be on our customers, their experiences with our product, and our newest technology developments and announcements.

For me personally, it was an exciting milestone to see Nutanix move from our humble beginnings of build-your-own IKEA desk and run your own ethernet cable across the floor, to hosting a conference dedicated to our users.  I never imagined we would make it so far in such a short time.  I also didn’t realize we’d attract so many mortal enemies bent on our destruction along the way. This is the cost of disrupting an industry filled with entrenched interests.

It’s our real life version of HBO’s Silicon Valley, and Hooli, Gavin, Chuck, and EMC will stop at nothing to steal, stifle, and stall innovation to protect their monopoly.  Lawyer fracases, intentional misdirection, blatant misinformation, and  “brainrapes“.  It’s all on the table.

We try not to let it distract us.  As our CEO Dheeraj invoked Amazon’s Jeff Bezos during his first keynote speech, “We will stay focused on the customers, while our competitors stay focused on us.”  Full disclosure, we were also busy celebrating with our incredible customers and partners at .NEXT — reminiscing on the adventures of the past five years, but also looking forward to the next stage.

6992116-mojito-cocktail-drink (1)

We are back home from Miami now and we owe Chuck some sort of acknowledgement for all the recognition he has given us in the last few weeks.  His blog gives the pretense of an “objective” comparison of performance and price. What this pretense sidesteps entirely (more accurately, what it attempts to obscure) is the profound difference that design and architecture make—in terms of simplicity, reliability, and performance under realistic workloads. In other words, the things that customers care about most.

A few questions came to mind from his most recent rumination…

 

Why are Chuck, VMware, and EMC targeting Nutanix?

EMC is losing ground, but VMware is a massive $37B company.  They have seen tremendous success in the last decade or so.  Their hypervisor runs in most enterprise datacenters on the face of the Earth, and they certainly have no trouble attracting folks to VMworld, which boasts 22,000+ attendees.  But VMware is also facing major competition with the public cloud players. Hyperconvergence is something that makes infrastructure solutions like VMware simpler and more attractive.  We’ve had some great success as solution partners.  So why is their chief strategist running a 20-part blog series targeting Nutanix?  Why keep attacking us? Why are they paying so much attention to Nutanix at all levels of their company? (And how much influence is EMC having on their behavior?)  VMware has never felt the need to publicly single out a company of our size.  Ever.  So why now?

I think the Twitterverse, our customers, and our partners have no delusions about what’s animating Chuck and EMC.

 

nutanix_tweet

Chuck and EMC are afraid of us.  But why?

It’s not about a new hypervisor.  It’s not even about our storage-related technologies.  Amazon has polarized the IT landscape, and Nutanix is bringing the same “one-click” simplicity to managing on-premise infrastructure and bridging it with the public cloud.  At Nutanix’s core is a distributed management fabric built with these technologies in mind.    Chuck, EMC, and VMware can’t retrofit their management stack to meet the demands of this new landscape… it needs a complete re-write. 

 

VMware’s vCenter is starting to show its age and complexity

VMware vCenter has been the gold standard in managing on-premise VM-based environments for the last decade.  Unfortunately, it really hasn’t changed much since then.  In fact, it’s even gotten more complex.  When I was a budding VMware architect for Accenture’s R&D labs, it was a fun and exciting technology on the bleeding edge.  Now vCenter has been bloated with a set of features that most of their customers pay for but frankly don’t care about or use.  Features that haven’t made their lives easier or their businesses more effective.  This decade of bloat complicates a solution that was built on the original merits of simplicity, consolidation, cost savings, and the core features of HA, DRS, and (the crowd favorite) vMotion.

I’ll illustrate my point by describing what should be a fairly simple process — setting up a resilient vCenter deployment for my datacenter:

 

  1. Provision two Windows Platform Services Controllers (PSC), using HCL supported operating systems.
  2. Run Windows update and fully patch both PSC VMs.
  3. Join the PSC VMs to your Windows domain, and reboot.
  4. Mount the vCenter ISO on PSC #1 and run the installer, deploying an external PSC. Join an existing SSO domain, or start a new SSO domain depending on your requirements.
  5. Mount the vCenter ISO on PSC #2 and run the installer, joining the second PSC to the first PSC/SSO domain.
  6. Manually configure your third party load balancer (F5, NetScaler, etc.) per VMware instructions for HA PSCs.
  7. Provision one vCenter Windows Server VM on a supported HCL supported operating system.
  8. Run Windows update and fully patch the vCenter VM.
  9. Install the Desktop Experience/Flash player on the vCenter VM.
  10. Run Windows update again to patch Flash/desktop experience.
  11. Provision a pair of HCL-listed clustered SQL servers for database high availability. Do not use SQL AlwaysOn Availability groups, as this is not supported. Deploy a traditional SQL cluster for HA.
  12. Manually create vCenter databases in SQL.
  13. Install the ODBC driver on the vCenter server, using HCL supported SQL version.
  14. Create vCenter service account.
  15. Create ODBC connection to SQL database.
  16. Mount the vCenter ISO on the vCenter VM and start the installation process.
  17. Deploy vCenter, using the HA PSCs and HA SQL servers.
  18. After vCenter is installed, add ESXi hosts to vCenter.
  19. Review VUM SQL HCL, and create a VUM database on a supported SQL version.
  20. Create VUM database in SQL.
  21. Create VUM ODBC connector on vCenter server.
  22. Install VUM on vCenter server, and configure downloads/schedule.
  23. Scan ESXi hosts for update and patch as needed.
  24. Use Derek Seaman’s SSL toolkit and VMware Certificate manager and manually create/deploy SSL certificates for PSCs, vCenter, VUM and ESXi hosts. Go Team Derek!
  25. Update load balancer SSL certificates to support high availability.
  26. Configure VMware HA to protect PSCs, SQL, and vCenter VMs.
  27. Configure NTP on all ESXi hosts, or configure host profiles and deploy to all ESXi hosts.
  28. Install Flash Player on all servers/clients used to manage the environment via the web client.
  29. Install the C# vSphere client on servers/clients as needed to manage the environment.
  30. Record all passwords/service account details in enterprise password management solution.

sql-error

Labeling it “Simple Install” is a misnomer.  I feel like I am building my own three tier application from scratch every time I deploy vCenter.  Quite frankly, it’s still not clear to me or other VMware experts if VMware even supports or recommends a resilient vCenter deployment.  The process makes LUNs, fabric management, and disk balancing on a SAN look easy.  What happened to vCenter heartbeat and all the other stuff I used to get a fault tolerant vCenter deployment working?  Also, why after all this work do our customers’ vCenter services keep stopping every time the vCenter DB fills up? Instead of addressing these core issues and this complexity for his customers, their chief strategist is focused on something else entirely.  Nutanix.

During our experience in simplifying the storage fabric and removing centralized storage arrays from the equation, we realized there was another element of the infrastructure stack ripe for disruption.  Virtualized infrastructure deployment and management is more complex than it should be.  Amazon Web Services has shown that infrastructure can be simpler.  Unfortunately, most administrators are so used to going through the motions with their virtual infrastructure that they don’t realize how much time they spend vManaging it.

What if my virtualization management solution came pre-installed and ready to go?  I plug in an appliance and power it on, give it some IPs on my network, and everything is up and running on a distributed and highly available management fabric.  No software to download, no DBs to install, no ODBC drivers to deploy or DB connections to create, no HA or NTP to configure (its done automatically), no DB clustering to get running.  No patches to find and download on my own.  No tables to truncate or disks to expand.  No separate product to install just to manage updates.  Heck, what if I didn’t even need to setup the virtual servers and operating systems for the management fabric to run on?  Let’s go ahead and make the whole management infrastructure invisible to the end user.

 

VMware hasn’t figured out how to build a distributed version of vCenter, or a solid web-based interface for it

VMware has been trying to move from their single database and server model for vCenter, to a scalable and distributed management fabric for some time.  They’ve made several attempts to re-architect vCenter in the last five years.  Their efforts have not succeeded.  They are stuck on an application architecture they know is outdated and has difficulty scaling and supporting their largest customer environments.  VMware also has been trying to upgrade from a client-server model to web-based browser management for their vSphere UI, but their flash web interface was met with fairly universal disdain.  The company still hasn’t figured out HTML5 yet.  While VMware has been busy protecting EMC and expanding into adjacent markets to satisfy the growth demanded by wall street, they’ve missed innovating on their own core business and platform.

challenges-distributed-systems

Imagine if my virtualization management solution ran on every node in the system and could tolerate failure of any node or device?  What if my management and metadata fabric scaled with my deployment using the same technologies that leading web-scale companies use?  What if my management interface was web-based, powered by HTML5, and could be used from any browser-enabled device?  What if it had instant search, field autocompletion, and sorting/filtering across all containers, datastores, and any other entities that were managed by it?  How about easy multi-select actions, tagging, as well as administration through grouping and selecting by tags or attributes?  What if it could allow me to manage entities across hypervisors, application containers, and elements of the public cloud?  What if it did all of this with simple to use, consumer-grade design?

 

EMC’s storage sales are declining.  VMware launched VSAN almost two years ago, but it’s still missing the majority of features that make up an enterprise storage system. EVO:RAIL is dead on arrival:

EVO:RAIL is struggling and close to getting dumped.   VSAN has seen a bit more success but it isn’t a substantial part of VMware’s business, or a replacement for EMC’s declining sales. Neither of these solutions is getting the market traction they were hoping for. They are also targeting them purely against Nutanix, rather than EMC’s cash cows that are already under fire due to the changing conditions of the storage market.  Because of VSANs in-kernel design, it’s been difficult for VMware to add features.   No VAAI support, compression, deduplication, data locality, one-click upgrades etc., etc.  They finally added snapshots.  It’s also worth recalling that, since VSAN is based on vSphere and vCenter, its inherited their growing complexity.

Meanwhile, Nutanix continues to release improvements at a rapid pace.  We’ve redesigned our UI.  We’ve released our own virtualization management and hypervisor solution with Nutanix Acropolis.  We’ve added cloud integration.  Soon we’ll make it easy to convert nodes between hypervisors and to manage storage, VMs, and containers on the same platform.  Despite all these improvements, we’ve kept our platform simple to deploy and use.  We’re building things that people like.

In the face of all this, Chuck needs to buy time for VMware and EMC to catch up.  Why not manufacture misleading comparisons on price and performance designed to distract the market from VMware’s and EMC’s manifold deficiencies?

Performance is a topic I know fairly well and this brings me to my second question.

 

If you can’t figure out how to build a distributed system, why do you think you understand how to test one?

Chuck comes from the days of yore where SANs and dual-controllers ruled the world.  His experience with fan-in storage systems led him to believe he could use a simple set of synthetic storage benchmarks to make grandiose claims regarding the performance of VMware’s hyperconverged solution.   Unfortunately, these benchmarks are better suited for testing single disk, SAN, or LUN performance (or for generating those fancy 1M+ IOPs marketing numbers that have no relevance in the real world).

good_old_san_days

Web-scale and distributed systems are something we know very well here at Nutanix.  In the real world, the testing of these systems is not so simple.  We are well aware that you can achieve a big number of IOPs using synthetic 8K random transactions with a large shared cache that you let sit and warm up for 30 minutes… this isn’t a surprise for anyone.  But what happens when you try to run realistic workloads across multiple nodes in your system?  Do the impacts of data locality and differences in our design and our architecture come to light?

To demonstrate these differences, we’re releasing our own set of tests, ones we are confident everyone would agree are more realistic and representative of how a customer utilizes a hyperconverged system.

The tests look at many factors:

  • Availability. How well does the hyperconverged solution tolerate failure?  What happens when a node fails during a workload? Does the system remain stable and deliver consistent performance?  Does the management fabric stay up through the failure?  Some very interesting data here.
  • Realistic Performance.  How well does the hyperconverged solution handle mixed workloads, such as running a database on one node, while running VDI workloads on several other nodes at the same time?  What happens when you throw VM snapshots, VDI bootstorms, and VM provisioning into the mix?  What about multiple workloads?  What if you have an OLTP DB workload running on one node and a Data Warehousing DB workload on another node?   What if you let these DB workloads run for 24 hours?
  • Network Utilization: One of the key aspects of a web-scale system is its respect for the network.  Network bandwidth is a shared resource that is difficult to scale.  How much bandwidth does the solution consume?  Does it leave resources available for my user VMs?  How is the solution’s performance affected when I am doing backups or data migration from or to the cluster?
  • Feature Set: Quick clones for VM provisioning? Deduplication and compression? VAAI support?  Native VM-level replication? Compatibility with multiple hypervisors?  Cloud Connect? 1-click upgrades for software, hypervisor, and BIOS/drivers?  Capability to choose the hypervisor or cloud solution that best meets your needs?  Ability to migrate and convert between hypervisors with same solution?
  • Serviceability and Operations: How easy is the system to operate and manage?  How complex and disruptive are any hypervisor or storage fabric updates?  Does the system provide me with quality data on alerts, performance, and other important statistics?
  • Data Integrity. Does the system keep my data safe during power outages or component failures? Does it corrupt or lose data?  This is a critical aspect of any storage device or filesystem.
  • Customer Support: Cluster Health and auto support? 90+ NPS scores for customer service?  Make sure to call your vendor.

A Real World Test Methodology for Web-scale Systems

Excited to see our initial set of tests?  We are very excited to share them with you.

To give you some hints at whats coming:

  • Test #1: The Mixed Workload Test (“A Day in the Life of a Hyperconverged System”)
  • Test #2: The Multiple Database Test – OLTP + Data Warehousing (the “Noisy Neighbor” test)
  • Test #3: The Network Scalability Test
  • Test #4: The 24-hour Database Test (with snapshots)
  • Test #5: The Node Serviceability and Upgrade Test
  • Test #6: [Keeping confidential for today]
  • Test #7: [Saving this one for the next blog as well]

One last thing I want to clear up.  We offered to remove restrictions in our EULA on testing so that VMware and EMC could publish their results, if VMware and EMC were willing to remove similar restrictions in their EULA and allow Nutanix to publish our own competitive testing.  Here is what we proposed to them:

“But if you are really of the view – as we are – that customers will be better served by transparency, let’s do the following: We propose that Nutanix, VMware, and your parent, EMC, each agree to remove any and all legal restrictions that would prevent each other (as well as bloggers) from disclosing test results, benchmarking, customer evaluations, customer testimonials, customer take-outs or account wins.  Let’s not stop at Chuck’s single “synthetic” test – let’s really allow full transparency and drop the legal restrictions.  We’d be delighted to share our testing as well as the stories of actual users who are choosing to purchase Nutanix products every day based on its performance in real life situations with real life workloads.”

Chuck and his lawyers declined.  Have the courage of your convictions to accurately represent what we offered you.

While we are not allowed to release any of our test results on VSAN, we are very excited to talk more about our solution testing methodology, and how differences in our architectures will greatly impact your experience.  We are also thrilled to release more information about Nutanix Acropolis and the future of cross-hypervisor and hybrid cloud management.  Stay tuned for Part 5 of Nutanix vs. VSAN performance (Why architecture matters).

In the mean time, you can:

Thanks for reading and look forward to your comments!

 

Nutanix launches WebScale Wish for Non-profits

This week, Nutanix launched a very cool program designed to give back to the community.  Three non-profits will receive free Nutanix equipment (total retail value approx $500,000) to help them makeover/improve their datacenter.

Head here to nominate a non-profit (or join the program if you are a non-profit).

Great to see us giving back after our success over the last 3 years.  Looking forward to seeing the non-profits that are selected for the free equipment.

 

Goodbye Dual Controller Arrays. Hyperconvergence meets All Flash.

GalleryChar_1900x900_FL24_8-9_52ab8f9a699311.09317022

Say goodbye to your dual controller storage arrays.

Flash has made high performing clustered storage systems a reality.  With Nutanix’s NX-9000 series announcement, webscale and hyperconverged architectures now have the capability to handle the most demanding workloads with large working sets.

Hyperconvergence.  Webscale.  Meet All Flash.

This announcement is a huge step forward for the storage industry.  Dual controller and centralized storage architectures were popularized in the mid 90’s because they allowed you to share files, provide higher levels of availability and protection, and RAID many shelves of spindles behind a storage controller to get higher effective performance. Centralized storage (and dual-controller arrays) also gained popularity in the last decade of virtualization since they were the only way at the time to achieve the High Availaibility and VM mobility features touted by VMware and other leading hypervisors.

Why are we stuck using an architecture that was intended to share files … to now host virtual machine disk files (ie VMDKs, VHDs) that normally belong to a single VM at any given time?  We shouldn’t be.  It’s time for the next major step in virtualization.

The emergence of SSD based drives, advancements in distributed systems, and hyperconvergence has made dual controller and centralized storage architectures obsolete.  A single flash drive can supply magnitudes more random IOPs at lower latencies than any 15k or 10k spinning media.  In fact, dual controller and centralized storage architectures are now limiting the operational improvements, performance, and availability that flash based storage can bring to your enterprise datacenter.

All Flash needs more.

  • Flash needs storage CPUs to drive IOPs.  Dual-controller architectures are limited in CPU.  Need more CPU?  Rip and Replace OR Deploy and Manage yet another array.
  • Flash needs scale-out clustering.  Dedup/compression can be done by many controllers in software on much larger data sets, allowing you to save additional flash capacity.  Most Dual-controller architectures can only dedup what fits behind them.
  • Flash needs RAIN architecture to drive higher rates of utilization.  Dual-controller architectures must be ran at 50% utilization to tolerate failure without performance impact.
  • Flash needs many medium sized storage controllers, rather than dual I/O bottlenecks.  Companies are running more than two virtual machines and can benefit from balancing across many controllers.
  • Flash needs Information Life-cycle Management (ILM).  Advanced flash storage software systems need to balance data between [high perf, high endurance, low capacity] flash drives and [low cost, higher capacity, low endurance] flash drives.  Similar to how hybrid storage systems balance between Flash SSD and spindles today to optimize cost and performance.
  • Flash needs better garbage collection.  The dirty secret of the flash industry… more on this later.
  • Flash needs to sit as close as possible to the VM compute resources that are doing the I/O.  Dual-controller (centralized) architectures require network hops for reads and higher network utilization.
  • Most importantly, Flash needs more than performance.  You can’t drown all the issues and complexities of traditional dual controller architecture in IOPs and low latency.  Flash is only part of the next generation storage solution.

All_Flash_Quarterly

 

Scale-Out trumps Rip and Replace:

Traditional storage vendors live by a three year rip-and-replace lifecycle.  Storage Controllers need to be swapped out in order to take advantage of advancements in Intel x86 processor capabilities.  With Nutanix’s revolutionary file system, new and old storage controllers can co-exist in the same cluster, allowing you to immediately employ advancements in Intel computing technologies for storage processing.  Dual-controller scale-out is a capacity planning and operations nightmare.  With Nutanix’s scale out clustering, you can increase your all-flash performance without a destructive and risky Rip and Replace cycle.

 

Scale-Out Storage Processing Power with Your Flash:

Flash requires storage processing power to drive IOPs and performance.  Why put all your flash behind two controllers? Dual-controller architectures cannot sufficiently drive large amounts of flash.  By spreading flash devices across many controllers, you can drive higher aggregate performance.  This performance also increases as you scale out the number of controllers.  All within a single datastore, namespace, and management domain.

 

Put Flash next to your VMs with Nutanix’s Data-Locality:

Data-locality matters.  The Virtual Machine data-locality built into the Nutanix Distributed File System (NDFS) keeps the majority of write and read storage I/O on local flash.   Read I/O does not need to traverse the network.  This results in an improvement of read latency while also reducing Network bandwidth consumption.

 

Erasure Coding provides Capacity Savings for RAIN-based architectures too:

One of the biggest criticisms of RAIN-based architecture is the requirement of data replication. Techniques like erasure coding will also allow RAIN-based architectures to become more efficient with capacity (and competitive with RAID capacity savings using parity), removing one of the key weaknesses and major criticisms.  As flash continues to come down in price (inevitably) — efficient scaling, performance, simplicity in operations, and advanced architectures will trump other considerations.

 

All Flash Needs Better Garbage Collection:

All-flash needs garbage collection. The dirty little secret of the flash industry is what one needs to do in the background. The foreground speeds are sexy and all good with a new array.  What occurs over time is a massive attrition of the flash. Efficiency of the background tasks are the unsexy secret of the flash industry that will expose the true character of all-flash systems. You were impeccable and fast when you were new and had a single workload with little fragmentation. What happens over time is life — a sobering fragmentation that slows you down. Great Flash storage software needs an awesome background framework for garbage collection. Backgrounders need CPU to introspect and retain the character of the system.  Scaling out CPU allows for more efficient background cleanup without impacting performance.  It also allows greater efficiency using techniques like MapReduce to distribute the workload.

 

RAIN (Redundant Array of Nodes) architectures drive higher availability and utilization than Dual-Controller RAID (Redundant Array of Disks) architectures:

This is a big one.  What happens when you lose one of your dual controllers in a traditional array?  You’ve effectively lost half of your storage processing capability and performance.  In order to ensure there is no major performance or availiability impact during failure, Dual-Controller Arrays should be ran at 50% or less of their processing capacity.  What a huge waste.  With RAIN, you no longer need to minimize your workload to 50% in order to tolerate failures without impact.  In a 10-node cluster for example, you can run each node at effectively 90% of its processing capacity and still the capability to tolerate any node failure without performance impact.  Consumption can be optimized by allowing multiple nodes to assist in the recovery of any failure.  This allow all nodes in the system to reach higher rates of utilization while still maintaining high availability.

 

Manage Resilient Converged Resource Pools with VM-level policies… not LUNs, FCP, iSCSI, Zoning, Volumes, WWPNs, WWNNs, Multipathing, etc. arrays:

Performance is not the major problem of today’s enterprise storage — its management, scaling, and operations.  Flash can help provide consistent performance, but IOPs are not going to solve your largest pain points.  Storage wars have devolved into a war of performance.  We take a different approach at Nutanix, by combining flash performance with a balanced focus on management and operations.  We keep things simple for the administrator so they can focus on more important aspects of IT and the business.  Try racking, stacking, and deploying 40-nodes of a Nutanix cluster.  It’s as easy as sliced bread (or for that matter, much easier than slicing LUNs and aggregates).

 

All-Flash_32

Don’t fall for The Fiddler’s tunes:

One of the more famous villians in the “Flash” series of comics was “The Fiddler”, Isaac Bowin

“In his early years, Isaac Bowin was a petty thief working the streets of India. While Bowin was attempting to rob a local merchant, local authorities found and apprehended him, whereupon he was taken to prison. During his incarceration, Bowin met an old Hindu fakir who used the power of music to hypnotize and control the actions of a deadly cobra. Bowin pressured the fakir into teaching him the mystical powers of the East, and the Hindu finally relented. Bowin proved an apt student, and before long, his knowledge surpassed that of his teacher. Using random materials found in his cell, Isaac fashioned a crude fiddle and used his new knowledge to escape from prison. He no longer required the services of the old fakir, so he used the power of his fiddle to murder him. Afterwards, he tracked down the merchant he attempted to rob earlier and executed him as well. Calling himself the Fiddler, Bowin returned to the United States to begin a new life of crime.”

The Fiddler used his musical powers to hypnotize and enslave the population of Keystone.  What was the Fiddler playing?  The same old tune on the same violin.  (Violin – perhaps there is some irony there?)

A dual controller flash array from EMC, NetApp, or any other vendor isn’t a dramatic change from the status quo.  It has the same problems and architectural limitations that your existing EMC and NetApp arrays have.  It can’t scale-out well.  You need to rip and replace it to upgrade.  You have to run it at 50% performance to be able to tolerate failure without a performance impact.  Same old NAS/SAN, with flash.

Consequently, flash is just one ingredient in next generation storage architecture.  By itself, it doesn’t change the game.  The big storage companies understand this fact well.  NetApp tried to fix their WAFL scaling limitations with clustered OnTAP.  EMC has tried to fix their own platform limitations by buying every company they could afford (and trying to purchase us).  We aren’t for sale.  We will continue to employ the top distributed systems talent from the leading web companies.  We are here to offer you something new.

Hypercongerence.  Webscale.  Meet All Flash.

 

Nutanix Support for Exchange on NFS

microsoft-exchange

In light of our recent whitepaper on Best Practices for Microsoft Exchange, I wanted to point out our official support statement for Microsoft Exchange on VMware datastores backed by NFS:

Nutanix support for Microsoft Exchange in a VMware virtual environment running on VMDKs backed by an NFS datastore

Versions affected

all

Description

This statement summarizes the support to Nutanix customers running licensed Microsoft Exchange applications on VMware virtual machines using NFS-based VMDKs and data stores. Widespread production deployment of NFS-based datastore for VMware virtual infrastructure solutions have led to numerous inquiries from customers on how they can access technical support for Microsoft Exchange for this configuration. It is Microsoft policy that Microsoft Exchange is not supported on VMware using NFS-based virtual disks.

 

Solution

Keeping with our company mission to offer our customers best-in-class infrastructure for virtualization, Nutanix will fully support customers running Microsoft Exchange 2010 and 2013 on VMware virtual machines with NFS-based virtual machine disks (VMDK) on their Virtual Computing Platform.  This is the preferred deployment method of Exchange 2010 and 2013 on a Nutanix system running VMware vSphere.  Our commitment includes helping customers work through any MS Exchange performance issues related to our platforms.  For optimal performance, we recommend our customers to continue using standard Microsoft sizing and best practices for Microsoft Exchange.  Detailed Nutanix best practices for Exchange on VMware are published in our guide.

Nutanix also offers our customers the option of running Microsoft Hyper-V on the Nutanix Virtual Computing Platform with the SMB 3.0 protocol for customers who would like full support from Microsoft.

If you have any questions on our support of Microsoft Exchange, please feel free to contact Nutanix support by opening a case through the customer support portal or email our Enterprise Solutions Engineering team at solutions@nutanix.com.

 

There is little technical difference from the perspective of Windows OS and the Microsoft Exchange application servers between an virtual SCSI disk stored on an iSCSI VMFS datastore, FCP VMFS datastore, and an NFS datastore.  They all support the same virtual SCSI functionality.

If you are interested in deploying a “Microsoft supported” solution, you should use  Nutanix SMB 3.0 on Hyper-V, or iSCSI-based datastores with VMware.   However, we see no technical issues with NFS-based datastores with VMware, so we have agreed to support customers who would prefer this configuration for the benefits it provides.