LoginVSI37_MediumWorkload_Nutanix

Competitors that don’t understand Storage Performance or Fair Play

I don’t get much time to blog these days… but I recently came across an article and some tweets touting how much faster a smaller competitors VDI appliance was than Nutanix… and more tweets saying that Nutanix shouldn’t even publish our performance data. This inspired me to write a post.

In the article and these tweets, a wannabe Nutanix competitor claims that their response times are “50% faster” than Nutanix by taking a Login VSI 4.0 chart from one of our Hyper-V Reference Architectures and comparing it to their own Login VSI 3.7 result:

Atlantis_Marketing_Comparison

50% faster response times??? Wow!!! Impressive…  Or is it?

The way that this company is presenting Login VSI results is nothing short of disingenuous (or ignorant).  This company shows a lack of integrity by failing to provide fair and thorough consideration to the competition before making public claims on product performance by:

  • Comparing Login VSI 3.7 results to Login VSI 4.0 results.
  • Taking the worst looking LoginVSI chart they could find rather than trying to emulate a similar setup for a legitimate comparison.
  • Comparing results which don’t have the same hypervisor, hypervisor configuration (specifically intel speedstep and power settings), OS+base image, or application load-out.
  • Using Login VSI results to make  universal claims on VDI and storage response times, when it is meant as a VDI CPU Density, Memory Performance, and Storage Sanity test.
  • Continuing to publicize their “competitive performance” results after being shown that they are not accurate

In the following blog post I will explain each of these points in more detail.  I will also show that if you run Login VSI 3.7 or Login VSI 4.0 on Nutanix with the correct ESXi setup you can achieve better results than the competitor even presented for themselves in this “competitive comparison”.

A quick background on Login VSI testing:

The basis of this article is the comparison of two different Login VSI charts.

Login VSI works by installing an agent on a VDI desktop to simulate a real-world application workload. Login VSI simulates on office VDI worker by running a variety of application tasks in a loop.  The loop also has elements of randomization to improve the modeling of real world testing:

 

LoginVSI_MediumWorkflow

Login VSI increases the number of active users on a solution as the VDI desktop agent collects response time statistics on this application workflow.  Login VSI collects all these statistics from the agents in a file that resides on the Login VSI file share.  This file can then be opened and analyzed through Login VSI’s management console.

The results show the trend of application response times as the number of active user desktops on the target platform is increased.  A key result of the test is a determination of when the target platform reaches a point of saturation called VSImax. This value equates to the maximum number of desktops that can be supported in an ideal scenario.  Specifically, it is the breaking point of the system as well as the maximum desktop density.

Nutanix likes to recommend that our customers target a VDI density that is 20-30% below this discovered maximum desktop density and breaking point.  The recommendation should equate to an average host CPU utilization of 70-80%.  The suggested desktop density provides a good balance between maximizing CPU utilization and also ensuring that the deployment has the ability to tolerate burst impacts such as VDI boot and login storms.

Now onto why Atlantis’ claims are incorrect and misleading:

Reason #1: Login VSI workloads have changed between versions, and so have the baselines.  They aren’t even comparing the same Login VSI benchmark version and results.

Login VSI baselines have changed significantly from version to version.  Login VSI also changed the workload intensity between 3.7 and 4.0. In this example, the competitor appears to compare two different versions of the tool (Login VSI 3.7 vs. Login VSI 4.0).  Login VSI modified how they calculate baselines and VSImax in each version of the tool.  Login VSI 4.0 also added additional workloads, so the baseline in Login VSI 4.0 is larger than the baseline in Login VSI 3.7

From Login VSI representative directly to the storage vendor and user community:

“A lot of our vendors did hold on to our earlier Login VSI version 3.7 after the release of v4.0 (and still do!). Although we updated the workloads in v4.0 to a more actual and realistic set of apps, data and activities, strongly appreciated by most Enterprise customers, it did have a huge impact for the vendors and especially the technical marketing colleagues amongst you. The test results were less favorable in v4.0 and the VSImax dropped slightly as a result of the heavier workloads.”

To show this impact, here are the results of Login VSI 3.7 and 4.0 on the same Nutanix system….

Login VSI 3.7 results for Medium Workload and Performance Power settings:

LoginVSI37_MediumWorkload_Nutanix

Here are Nutanix Login VSI 4.0 results for Medium Workload and Performance Power settings:

LoginVSI4_MediumWorkload_Nutanix

 

As you can see the baselines between Login VSI 3.7 and Login VSI 4.0 are different.  The Nutanix baseline and results on Login VSI 3.7  has both a lower baseline and lower average response time than both of the graphs in the article.  When you are publicly displaying competitive performance results in online articles and on TVs in your booth at VMworld, you should at least ensure you are using the same benchmark…

Reason #2: Hypervisor and Hypervisor configuration, Operating System, Base Image configuration, and Application load out affect Login VSI tests.  They need to be the same in order to compare  absolute results.

In the test being compared in the advertisement, Nutanix is using a Login VSI Medium workload on Hyper-V with “Balanced” power settings.

This test also has higher than normal baseline due to the application loadout and hypervisor settings (such as the intel speedstep power setting).  Differences in the application loadout, OS version, or test harness can cause the baseline to increase.  My understanding is that USX does not yet even support Hyper-V with their platform. So what are they comparing here?

Login VSI is intended to look at the relative increase in application response times as you increase the number of active sessions to discover the point of saturation in a system.  It should never be used to compare absolute response times unless you verify that the OS, base image, application loadout, hypervisor and hypervisor configuration are all the same.

Atlantis took no care in ensuring that the OS, Image, and Application loadout, hypervisor version, and hypervisor configuration were consistent before publishing their claims on product performance.  They simply pulled the worst looking Login VSI chart they could find from all of our public RAs and compared it to the best looking chart they could come up with.  You can’t compare baselines when you are using a different version of the Login VSI tool, workload, hypervisor, base image, and application load out on the VDI image.

Or why didn’t they use this chart from pg.44 of Steve Poitras’ old XenDesktop on vSphere RA?

http://go.nutanix.com/rs/nutanix/images/TG_XenDesktop_vSphere_on_Nutanix_RA.pdf

Nutanix Login VSI 3.7 Medium workload Circa 2013 on older platform:

LoginVSI37_MediumWorkload_Nutanix2013

 

For the sake of proving this with some data, I went ahead and had my team run the test setups more similar to what they appear to be testing in their comparison.  We tested Login VSI 3.7 Medium workload at both 100 VMs/node and 125 VMs/node densities.

Nutanix Login VSI 3.7 Medium workload 9/2014 on NX-3460 series (100+ VMs/node):

LoginVSI37_MediumWorkload_Nutanix

 

 

Nutanix Login VSI 3.7 Medium workload 9/2014 on NX-3460 series (125 VMs/node):

LoginVSI_MediumWorkload_Nutanix500users

Compare these again to Atlantis USX Chart in the article:

LoginVSI37_AtlantisUSX_MarketingChart

The Nutanix charts show even a lower baseline and average response time than Atlantis, with the Nutanix NDFS maximum response time never crossing 3000ms, and the Nutanix NDFS average staying below 1500ms during the test.  In the Atlantis graph they are crossing 3200ms in max application response time several times before they get to 300 desktops, and their average response time hovers around 1600ms.  The scale of the graphs aren’t even the same on the Y-axis.  Nutanix can also achieve 125 VMs/node without a problem in this test.  The density claims are invalid.

Even so… I would not use this data to say “Nutanix NDFS VDI and storage response time is X% faster than Atlantis USX”.  First, I am not sure of their exact test specifications, hypervisor settings, or base image.  More importantly, I would not make this assertion because:

Reason #3: Login VSI 3.7 and Login VSI 4.0 medium workload are not a good test of storage performance or storage latency.  They are a great VDI CPU Density, Memory Performance, and Storage Sanity test.

This is the most significant issue with their representation.

Let’s say for example they did do competitive test with the same LoginVSI version, workload, and hypervisor/OS/application load out.  Even in this ideal competitive comparison, the largest contributor to Login VSI application response times is not storage performance or storage latency.  If the storage latency remains under 5ms, storage has little impact on the Login VSI application response time charting and VSImax.

Login VSI response times are driven by CPU utilization and CPU ready times.  As indicated previously, Login VSI response times are the summation of application response times.  These response times are bound by how long application CPU instructions take to get scheduled by the VDI OS and in turn by the hypervisor vCPU scheduler.  In other words, response time is driven mostly by CPU utilization and scheduling efficiency.  The majority of applications in this test workload do relatively minimum storage I/O.  This test for example generates about 7-12 IOPs per VM. Storage latencies on the Nutanix Virtual Computing Platform during the test are usually less <1ms for reads, and 1-4ms for writes.   A storage system like ours is capable of handling many more IOPs at similar latency.  This workload is only tickling the storage I/O path.

The Login VSI folks are first to admit that their current VDI workloads are not a good fit for comparing storage response time and storage performance of different systems.

LoginVSI representative:

“You are correct about the IOPS usage. The current workloads are focused mainly on CPU/Mem.” — from <http://www.loginvsi.com/forum/support-v4/714-same-number-of-iops-per-desktop-on-all-workloads>

The general VDI user and application workload that is modeled on Login VSI does not generate enough IOPs to stress a system like ours (at 1300 IOPs/node with 125 VMs).   Any small differences (1-2ms) in storage response times do not greatly affect the result graph, since they are a small component of the total Login VSI application response time.  Only large issues with storage will significantly affect application response times and  translate into an impact in the Login VSI results.  This Login VSI workload is mostly testing  out the performance of the host Intel CPUs, motherboard, and memory bus.

Login VSI IOPs for each node during a 125 VMs/node test run:

IOPs

There are great free tools out there for filesystem and storage competitive performance testing such as IOmeter. If you want to compare storage response times and performance, I would strongly recommend doing scale-out IOmeter testing across many workers, disks, and VMs which will significantly stress the storage path.  You should also be careful to set the outstanding I/O so it is large enough to model a real-world environment, but not too large that it will go above ESXi or Hyper-v outstanding I/O limits and start queueing storage requests in the hypervisor.

Login VSI is  a great tool to determine how many desktops of a certain configuration that you can fit on a given server or hyper-converged host based on a specific CPU spec, OS, hypervisor, and application loadout.  It is also a good tool for verifying that your storage system does not crash or have issues under a real world workload (ie storage sanity testing).  Based on our recommendation to never overcommit RAM in VDI deployments, the CPU of the server or host should always be the “bottleneck” in the testing since your storage will just be doing its job.  You should see an increase in Login VSI response times that corresponds directly with CPU utilization on the host.  VSImax is attained when the CPU utilization on the host reaches a point of saturation (ie 100%) and CPU ready times for the virtual machines increase.  We have found that Login VSI’s modeling is pretty accurate to what we see at real world VDI customers on the Nutanix platform through our autosupport and Nutanix pulse data… customers are much more likely to run out of host CPU before they run out of storage IOPs when deploying VDI workloads at scale on Nutanix.  This makes VDI an awesome fit for our hybrid (flash+HDD) and scale-out arrays.  I have seen no reason to move to all flash or specialized hardware for these VDI workloads.  They are highly cache’able and benefit from the design and cost advantages of a hybrid-array.

CPU density, memory performance, and storage sanity testing are exactly why we use Login VSI in house at Nutanix.  We verify that our suggested vCPU to physical core oversubscription ratio recommendations for VDI are validated with a real-world workload.

Atlantis insinuates that these graphs somehow are related to storage performance.  In reality, these Login VSI VDI response times have much more to do with the other factors presented in this post.

UPDATE:

Now since I have  debunked their claims… they are now looking for me to respond to the exact same test with ***5000 users*** where they have inexplicably removed their “Maximum response time” line from the Login VSI results. (why did they remove the Max response line from the result?)

My response on this benchmark and to this request remains the same.  As you add Nutanix nodes, you will scale linearly in CPU and memory performance, and we can do 5000 just as easy as we do 500 (we also have real customer deployments 5000 large =) ).  Login VSI is mostly a CPU, memory, and storage sanity test.  Unfortunately, I don’t have 40 Nutanix nodes sitting around idle to teach the same lesson.  We already understand CPU, memory density, and scalability of our platform for VDI workloads, which has been a massive market for us.  The majority of our performance equipment is being used now to actually iteratively test and improve our storage and file system performance so we can handle the most demanding of storage I/O workloads on Nutanix (Big Oracle, Big SQL, SAP, Exchange, etc)… and that is what is most important to me and my team at this point. 

I really don’t like punching down and I do apologize for any personal reactions on the twitter-verse… but I am not going to stand for any public misrepresentations of Nutanix product performance and competitive performance results.  I am sure this will lead to someone at Atlantis asking for a bakeoff with us.   I am not interested in giving them any more attention than I already have.  We have nothing to gain by beating a company that is approximately 1/10th our market capitalization.

My intention here was to show that the results they presented are not accurate for our appliance when you use a similar test setup, and that their charts are publicly misrepresenting Nutanix and inappropriately misleading customers by making illegitimate claims on relative product performance.  I am also hoping my post was somewhat educational.

Giving the competition a fair shake:

When we do competitive testing at Nutanix we do our best to ensure that the competitive system is optimally configured and on an even playing field.  For example, when we tested VMware VSAN (now part of EVO:RAIL) performance internally, we deployed it on the same hardware systems and used the exact same testing harnesses and versions.  We followed all the best practices we could find online to optimize their configuration and even did some experimentation on our own to see how we could increase VSAN (now part of EVO:RAIL) performance.  Despite getting some favorable results for Nutanix on many of the tests, we still haven’t publicized this data and we don’t plan to.  VMware deserves the benefit of the doubt on a brand new product for them. They have great engineers and they will continue and improve their performance and feature set as the product matures.  We are going to continue to focus on improving our own performance and feature-set and maintaining leadership in the market that we’ve effectively created.  The point of this is you should always give the competition a fair shake.  In the end, the customer will decide.

Moral of the story:

As a customer, take every product performance claim with a healthy dose of suspicion.  Also, spend more energy focusing on other important aspects of the solution (stability, manageability, features, etc.).  Too often storage wars come down to a performance bakeoff when either solution would offer more than enough performance to meet the customer’s requirements.  In most cases, the customer would be better suited by focusing on the differences in deployment, management, feature set, operations, industry leadership, company personnel and vision of the competing solutions.

As a competitor, when you are going to publicly release “competitive performance comparisons” you better be damn sure you are giving your competition a fair shake.  Don’t stack the deck and compare their worst test setup to your best with different hypervisors and versions of a benchmark.  Also, stay away from public competitive performance claims (and consequently, false advertisement) when you don’t understand the key variables, results, and testing tools.

 

Sincerely,

Lukas Lundell

Global Director of Solutions and Performance Engineering at Nutanix.

 

PS:  Big thanks to my teammates (specifically Will Stickland and Brad Kintner) for their contributions and ensuring we have a top notch performance lab and data at our disposal.

Comments
2 Responses to “Competitors that don’t understand Storage Performance or Fair Play”
  1. Chris Wahl says:

    Lukas –

    This is one of the most comprehensive looks into the world of Login VSI that I have read thus far. Competitive bits aside, thanks for taking the time to very thoroughly and technically explain your benchmark process and how things work under the covers.

Leave A Comment