Nutanix vs. VSAN Price and Performance — Part 4 (or why Chuck and EMC are deathly afraid of us)
We have recently suffered through several blog posts from Chuck Hollis, EMC’s chief blogger and strategist on loan to VMware. Chuck released these blog posts to coincide with (and distract from) our first .NEXT user conference in Miami earlier this month. We declined to respond to them at the time. Our company was proud to be launching its first user conference and we wanted the focus during the last few weeks to be on our customers, their experiences with our product, and our newest technology developments and announcements.
For me personally, it was an exciting milestone to see Nutanix move from our humble beginnings of build-your-own IKEA desk and run your own ethernet cable across the floor, to hosting a conference dedicated to our users. I never imagined we would make it so far in such a short time. I also didn’t realize we’d attract so many mortal enemies bent on our destruction along the way. This is the cost of disrupting an industry filled with entrenched interests.
It’s our real life version of HBO’s Silicon Valley, and Hooli, Gavin, Chuck, and EMC will stop at nothing to steal, stifle, and stall innovation to protect their monopoly. Lawyer fracases, intentional misdirection, blatant misinformation, and “brain–rapes“. It’s all on the table.
We try not to let it distract us. As our CEO Dheeraj invoked Amazon’s Jeff Bezos during his first keynote speech, “We will stay focused on the customers, while our competitors stay focused on us.” Full disclosure, we were also busy celebrating with our incredible customers and partners at .NEXT — reminiscing on the adventures of the past five years, but also looking forward to the next stage.
We are back home from Miami now and we owe Chuck some sort of acknowledgement for all the recognition he has given us in the last few weeks. His blog gives the pretense of an “objective” comparison of performance and price. What this pretense sidesteps entirely (more accurately, what it attempts to obscure) is the profound difference that design and architecture make—in terms of simplicity, reliability, and performance under realistic workloads. In other words, the things that customers care about most.
A few questions came to mind from his most recent rumination…
Why are Chuck, VMware, and EMC targeting Nutanix?
EMC is losing ground, but VMware is a massive $37B company. They have seen tremendous success in the last decade or so. Their hypervisor runs in most enterprise datacenters on the face of the Earth, and they certainly have no trouble attracting folks to VMworld, which boasts 22,000+ attendees. But VMware is also facing major competition with the public cloud players. Hyperconvergence is something that makes infrastructure solutions like VMware simpler and more attractive. We’ve had some great success as solution partners. So why is their chief strategist running a 20-part blog series targeting Nutanix? Why keep attacking us? Why are they paying so much attention to Nutanix at all levels of their company? (And how much influence is EMC having on their behavior?) VMware has never felt the need to publicly single out a company of our size. Ever. So why now?
I think the Twitterverse, our customers, and our partners have no delusions about what’s animating Chuck and EMC.
Chuck and EMC are afraid of us. But why?
It’s not about a new hypervisor. It’s not even about our storage-related technologies. Amazon has polarized the IT landscape, and Nutanix is bringing the same “one-click” simplicity to managing on-premise infrastructure and bridging it with the public cloud. At Nutanix’s core is a distributed management fabric built with these technologies in mind. Chuck, EMC, and VMware can’t retrofit their management stack to meet the demands of this new landscape… it needs a complete re-write.
VMware’s vCenter is starting to show its age and complexity
VMware vCenter has been the gold standard in managing on-premise VM-based environments for the last decade. Unfortunately, it really hasn’t changed much since then. In fact, it’s even gotten more complex. When I was a budding VMware architect for Accenture’s R&D labs, it was a fun and exciting technology on the bleeding edge. Now vCenter has been bloated with a set of features that most of their customers pay for but frankly don’t care about or use. Features that haven’t made their lives easier or their businesses more effective. This decade of bloat complicates a solution that was built on the original merits of simplicity, consolidation, cost savings, and the core features of HA, DRS, and (the crowd favorite) vMotion.
I’ll illustrate my point by describing what should be a fairly simple process — setting up a resilient vCenter deployment for my datacenter:
- Provision two Windows Platform Services Controllers (PSC), using HCL supported operating systems.
- Run Windows update and fully patch both PSC VMs.
- Join the PSC VMs to your Windows domain, and reboot.
- Mount the vCenter ISO on PSC #1 and run the installer, deploying an external PSC. Join an existing SSO domain, or start a new SSO domain depending on your requirements.
- Mount the vCenter ISO on PSC #2 and run the installer, joining the second PSC to the first PSC/SSO domain.
- Manually configure your third party load balancer (F5, NetScaler, etc.) per VMware instructions for HA PSCs.
- Provision one vCenter Windows Server VM on a supported HCL supported operating system.
- Run Windows update and fully patch the vCenter VM.
- Install the Desktop Experience/Flash player on the vCenter VM.
- Run Windows update again to patch Flash/desktop experience.
- Provision a pair of HCL-listed clustered SQL servers for database high availability. Do not use SQL AlwaysOn Availability groups, as this is not supported. Deploy a traditional SQL cluster for HA.
- Manually create vCenter databases in SQL.
- Install the ODBC driver on the vCenter server, using HCL supported SQL version.
- Create vCenter service account.
- Create ODBC connection to SQL database.
- Mount the vCenter ISO on the vCenter VM and start the installation process.
- Deploy vCenter, using the HA PSCs and HA SQL servers.
- After vCenter is installed, add ESXi hosts to vCenter.
- Review VUM SQL HCL, and create a VUM database on a supported SQL version.
- Create VUM database in SQL.
- Create VUM ODBC connector on vCenter server.
- Install VUM on vCenter server, and configure downloads/schedule.
- Scan ESXi hosts for update and patch as needed.
- Use Derek Seaman’s SSL toolkit and VMware Certificate manager and manually create/deploy SSL certificates for PSCs, vCenter, VUM and ESXi hosts. Go Team Derek!
- Update load balancer SSL certificates to support high availability.
- Configure VMware HA to protect PSCs, SQL, and vCenter VMs.
- Configure NTP on all ESXi hosts, or configure host profiles and deploy to all ESXi hosts.
- Install Flash Player on all servers/clients used to manage the environment via the web client.
- Install the C# vSphere client on servers/clients as needed to manage the environment.
- Record all passwords/service account details in enterprise password management solution.
Labeling it “Simple Install” is a misnomer. I feel like I am building my own three tier application from scratch every time I deploy vCenter. Quite frankly, it’s still not clear to me or other VMware experts if VMware even supports or recommends a resilient vCenter deployment. The process makes LUNs, fabric management, and disk balancing on a SAN look easy. What happened to vCenter heartbeat and all the other stuff I used to get a fault tolerant vCenter deployment working? Also, why after all this work do our customers’ vCenter services keep stopping every time the vCenter DB fills up? Instead of addressing these core issues and this complexity for his customers, their chief strategist is focused on something else entirely. Nutanix.
During our experience in simplifying the storage fabric and removing centralized storage arrays from the equation, we realized there was another element of the infrastructure stack ripe for disruption. Virtualized infrastructure deployment and management is more complex than it should be. Amazon Web Services has shown that infrastructure can be simpler. Unfortunately, most administrators are so used to going through the motions with their virtual infrastructure that they don’t realize how much time they spend vManaging it.
What if my virtualization management solution came pre-installed and ready to go? I plug in an appliance and power it on, give it some IPs on my network, and everything is up and running on a distributed and highly available management fabric. No software to download, no DBs to install, no ODBC drivers to deploy or DB connections to create, no HA or NTP to configure (its done automatically), no DB clustering to get running. No patches to find and download on my own. No tables to truncate or disks to expand. No separate product to install just to manage updates. Heck, what if I didn’t even need to setup the virtual servers and operating systems for the management fabric to run on? Let’s go ahead and make the whole management infrastructure invisible to the end user.
VMware hasn’t figured out how to build a distributed version of vCenter, or a solid web-based interface for it
VMware has been trying to move from their single database and server model for vCenter, to a scalable and distributed management fabric for some time. They’ve made several attempts to re-architect vCenter in the last five years. Their efforts have not succeeded. They are stuck on an application architecture they know is outdated and has difficulty scaling and supporting their largest customer environments. VMware also has been trying to upgrade from a client-server model to web-based browser management for their vSphere UI, but their flash web interface was met with fairly universal disdain. The company still hasn’t figured out HTML5 yet. While VMware has been busy protecting EMC and expanding into adjacent markets to satisfy the growth demanded by wall street, they’ve missed innovating on their own core business and platform.
Imagine if my virtualization management solution ran on every node in the system and could tolerate failure of any node or device? What if my management and metadata fabric scaled with my deployment using the same technologies that leading web-scale companies use? What if my management interface was web-based, powered by HTML5, and could be used from any browser-enabled device? What if it had instant search, field autocompletion, and sorting/filtering across all containers, datastores, and any other entities that were managed by it? How about easy multi-select actions, tagging, as well as administration through grouping and selecting by tags or attributes? What if it could allow me to manage entities across hypervisors, application containers, and elements of the public cloud? What if it did all of this with simple to use, consumer-grade design?
EMC’s storage sales are declining. VMware launched VSAN almost two years ago, but it’s still missing the majority of features that make up an enterprise storage system. EVO:RAIL is dead on arrival:
EVO:RAIL is struggling and close to getting dumped. VSAN has seen a bit more success but it isn’t a substantial part of VMware’s business, or a replacement for EMC’s declining sales. Neither of these solutions is getting the market traction they were hoping for. They are also targeting them purely against Nutanix, rather than EMC’s cash cows that are already under fire due to the changing conditions of the storage market. Because of VSANs in-kernel design, it’s been difficult for VMware to add features. No VAAI support, compression, deduplication, data locality, one-click upgrades etc., etc. They finally added snapshots. It’s also worth recalling that, since VSAN is based on vSphere and vCenter, its inherited their growing complexity.
Meanwhile, Nutanix continues to release improvements at a rapid pace. We’ve redesigned our UI. We’ve released our own virtualization management and hypervisor solution with Nutanix Acropolis. We’ve added cloud integration. Soon we’ll make it easy to convert nodes between hypervisors and to manage storage, VMs, and containers on the same platform. Despite all these improvements, we’ve kept our platform simple to deploy and use. We’re building things that people like.
In the face of all this, Chuck needs to buy time for VMware and EMC to catch up. Why not manufacture misleading comparisons on price and performance designed to distract the market from VMware’s and EMC’s manifold deficiencies?
Performance is a topic I know fairly well and this brings me to my second question.
If you can’t figure out how to build a distributed system, why do you think you understand how to test one?
Chuck comes from the days of yore where SANs and dual-controllers ruled the world. His experience with fan-in storage systems led him to believe he could use a simple set of synthetic storage benchmarks to make grandiose claims regarding the performance of VMware’s hyperconverged solution. Unfortunately, these benchmarks are better suited for testing single disk, SAN, or LUN performance (or for generating those fancy 1M+ IOPs marketing numbers that have no relevance in the real world).
Web-scale and distributed systems are something we know very well here at Nutanix. In the real world, the testing of these systems is not so simple. We are well aware that you can achieve a big number of IOPs using synthetic 8K random transactions with a large shared cache that you let sit and warm up for 30 minutes… this isn’t a surprise for anyone. But what happens when you try to run realistic workloads across multiple nodes in your system? Do the impacts of data locality and differences in our design and our architecture come to light?
To demonstrate these differences, we’re releasing our own set of tests, ones we are confident everyone would agree are more realistic and representative of how a customer utilizes a hyperconverged system.
The tests look at many factors:
- Availability. How well does the hyperconverged solution tolerate failure? What happens when a node fails during a workload? Does the system remain stable and deliver consistent performance? Does the management fabric stay up through the failure? Some very interesting data here.
- Realistic Performance. How well does the hyperconverged solution handle mixed workloads, such as running a database on one node, while running VDI workloads on several other nodes at the same time? What happens when you throw VM snapshots, VDI bootstorms, and VM provisioning into the mix? What about multiple workloads? What if you have an OLTP DB workload running on one node and a Data Warehousing DB workload on another node? What if you let these DB workloads run for 24 hours?
- Network Utilization: One of the key aspects of a web-scale system is its respect for the network. Network bandwidth is a shared resource that is difficult to scale. How much bandwidth does the solution consume? Does it leave resources available for my user VMs? How is the solution’s performance affected when I am doing backups or data migration from or to the cluster?
- Feature Set: Quick clones for VM provisioning? Deduplication and compression? VAAI support? Native VM-level replication? Compatibility with multiple hypervisors? Cloud Connect? 1-click upgrades for software, hypervisor, and BIOS/drivers? Capability to choose the hypervisor or cloud solution that best meets your needs? Ability to migrate and convert between hypervisors with same solution?
- Serviceability and Operations: How easy is the system to operate and manage? How complex and disruptive are any hypervisor or storage fabric updates? Does the system provide me with quality data on alerts, performance, and other important statistics?
- Data Integrity. Does the system keep my data safe during power outages or component failures? Does it corrupt or lose data? This is a critical aspect of any storage device or filesystem.
- Customer Support: Cluster Health and auto support? 90+ NPS scores for customer service? Make sure to call your vendor.
A Real World Test Methodology for Web-scale Systems
Excited to see our initial set of tests? We are very excited to share them with you.
To give you some hints at whats coming:
- Test #1: The Mixed Workload Test (“A Day in the Life of a Hyperconverged System”)
- Test #2: The Multiple Database Test – OLTP + Data Warehousing (the “Noisy Neighbor” test)
- Test #3: The Network Scalability Test
- Test #4: The 24-hour Database Test (with snapshots)
- Test #5: The Node Serviceability and Upgrade Test
- Test #6: [Keeping confidential for today]
- Test #7: [Saving this one for the next blog as well]
One last thing I want to clear up. We offered to remove restrictions in our EULA on testing so that VMware and EMC could publish their results, if VMware and EMC were willing to remove similar restrictions in their EULA and allow Nutanix to publish our own competitive testing. Here is what we proposed to them:
“But if you are really of the view – as we are – that customers will be better served by transparency, let’s do the following: We propose that Nutanix, VMware, and your parent, EMC, each agree to remove any and all legal restrictions that would prevent each other (as well as bloggers) from disclosing test results, benchmarking, customer evaluations, customer testimonials, customer take-outs or account wins. Let’s not stop at Chuck’s single “synthetic” test – let’s really allow full transparency and drop the legal restrictions. We’d be delighted to share our testing as well as the stories of actual users who are choosing to purchase Nutanix products every day based on its performance in real life situations with real life workloads.”
Chuck and his lawyers declined. Have the courage of your convictions to accurately represent what we offered you.
While we are not allowed to release any of our test results on VSAN, we are very excited to talk more about our solution testing methodology, and how differences in our architectures will greatly impact your experience. We are also thrilled to release more information about Nutanix Acropolis and the future of cross-hypervisor and hybrid cloud management. Stay tuned for Part 5 of Nutanix vs. VSAN performance (Why architecture matters).
In the mean time, you can:
- Check out our Acropolis announcement, demo, and keynote.
- View our .NEXT intro and .NEXT wrap-up video.
- Learn about our partnerships with VMware.
- Sign up and try the Nutanix Community Edition (based on Acropolis) first hand.
- Read more on all of our .NEXT announcements and the era of invisible infrastructure.
- Or click here for video reminiscent of EMC’s latest acquisitions.
Thanks for reading and look forward to your comments!