See all blog posts

Is Arm ready for server dominance?

For a long time, Arm processors have been the kings of mobile. But that didn’t make a dent in the server market, still dominated by Intel and AMD and their x86 instruction set. Major disruptive shifts in technology don’t come often. Displacing strong incumbents is hard and predictions about industry-wide replacement are, more often than not, wrong.

But this kind of major disruption does happen. In fact we believe the server market is seeing tectonic changes which will eventually favor Arm-based servers. Companies that prepare for it are ripe for extracting value from this trend.

This became clear this week, when AWS announced their new generation of Graviton2-based chips. The Graviton2 System on a Chip (SoC) is based on the Arm Neoverse N1 core. AWS claims they are much faster than their predecessors, a claim that we put to the test in this article.

There is also movement in other parts of the ecosystem: startups like Nuvia are also a sign of times to come. With a $53 million Series A just announced, with a founder team that packs years of experience in chip design and employs as VP of Software Jon Masters, a well-known Arm-advocate in the Linux community and previous Chief Arm Architect at Red Hat, Nuvia is a name you will hear a lot more about pretty soon.

At ScyllaDB, we believe these recent events are an indication of a fundamental shift and not just a new inconsequential offer. Companies that ride this trend are in a good position to profit from it.

The commoditization of servers and instruction sets

The infrastructure business is a numbers game. In the end, personalization matters little and those who can provide the most efficient service wins. For this reason, Arm-based processors, now the dominant force in the mobile world, have been perennially on the verge of a server surge. It’s easy to see why: Arm-based servers are known to be extremely energy efficient. With power accounting for almost 20% of datacenter costs, a move towards energy efficient is definitely a welcome one.

The explosion of mobile and IoT have been at the forefront of the dramatic shift in the eternal evolutionary battle of RISC (Arm) vs. CISC (x86). As Hennessy and Patterson observed last year “In today’s post-PC era, x86 shipments have fallen almost 10% per year since the peak in 2011, while chips with RISC processors have skyrocketed to 20 billion.” Still as recently as 2017 Arm only accounted for 1% of the server market. We believe that the market is now on the inflection point of change. We’re far from the only ones with that same thought.

There have been, however, challenges for adoption in practice. Unlike in the x86 world, there hasn’t so far been a dominant set of vendors offering a standardized platform. The Arm world is still mostly custom made, which is an advantage in mobile but a disadvantage for the server and consumer market. This is where startups like Nuvia can change the game, by offering a viable standard-based platform.

The cloud also radically changes the economics of platform selection: inertia and network effects are stronger in a market with many buyers that will naturally gravitate towards their comfort zone. But as more companies offload (and upload) their workloads to the cloud and refrain from running their own datacenters, innovation becomes easier if the cost is indeed justified.

By analogy if you look at the gaming market, there has been a strong lock-in based on your platform of choice: Xbox, PS4 or a high end gaming PC. But as cloud gaming emerges from a niche into the mainstream, like Project xCloud or any number of other cloud gaming platforms, enabling you to play your favorite games from just about any nominal device, that hardware lock-in becomes less prevalent. The power shifts from the hardware platform to the cloud.

Changes are easier when they are encapsulated. And that’s exactly what the cloud brings to the table. Compatibility of server applications are not a problem for new architectures: Linux runs just as well across multiple platforms and as applications become more and more high level, the moat provided by the instruction set gets demolished and the decision shifts to economic factors. In an age where most applications are serverless and/or microservices oriented interacting with cloud-native services, does it really matter what chipset goes underneath?

Arm’s first foray in the cloud: EC2 A1 instances

AWS announced in late 2018 the EC2 A1 instances, featuring their own AWS-manufactured Arm silicon. This was definitely a signal of a potential change, but back then, we took it for a spin and the results were underwhelming.

Executing a CPU benchmark in the EC2 A1 and comparing it to the x86-based M5d.metal hints just how big the gap is. As you can see in Table 1 below, the EC2 A1 instances perform much worse in any of the CPU benchmark tests conducted, with the exception of the cache benchmark. For most others, the difference is not only present but also huge, certainly much bigger than the 46% price difference that the A1 instance has compared to their M5 x86 counterparts.

Test EC2 A1 EC2 M5d.metal Difference
cache 1280 311 311.58%
icache 18209 34368 -47.02%
matrix 77932 252190 -69.10%
cpu 9336 24077 -61.22%
memcpy 21085 111877 -81.15%
qsort 522 728 -28.30%
dentry 1389634 2770985 -49.85%
timer 4970125 15367075 -67.66%

Table 1: Result of the stress command: stress-ng --metrics-brief --cache 16 --icache 16 --matrix 16 --cpu 16 --memcpy 16 --qsort 16 --dentry 16 --timer 16 -t 1m

But microbenchmarks can be misleading. At the end of the day, what truly matters is application performance. To put that to the test, we ran a standard read benchmark of the ScyllaDB NoSQL database, in a single-node configuration. Using the m5.4xlarge as a comparison point — it has the same number of vCPUs as the EC2 A1 — we can see that while the m5.4xlarge sustains around 610,000 reads per second, the a1.metal is capable of doing only 102,000 reads/s. In both cases, all available CPUs are at 100% utilization.

This corresponds to an 84% decrease in performance, which doesn’t justify the lower price.

Figure1: Benchmarking a ScyllaDB NoSQL database read workload with small metadata payloads, which makes it CPU-bound. EC2 m5.4xlarge vs EC2 a1.metal. ScyllaDB is able to achieve 600,000 reads per second in this configuration for the x86-based m5.4xlarge, but the performance difference is 84% worse for the a1.metal, while the price is only 46% cheaper.

Aside from just the CPU power, the EC2 A1 instances are EBS-only instances, which means running a high performance database or any other data-intense application is a challenge on its own, since they lack the fast NVMe devices that are present in other instances like the M5d.

In summary, while the A1 is a nice wave to the Arm community, and may allow some interesting use cases, it does little to change the dynamics of the server market.

Arm reaches again: the EC2 M6 instances

This all changed this week when AWS during its annual re:Invent conference announced the availability of their new class of Arm-based servers, the M6g and M6gd instances among others, based on the Graviton2 processor.

We ran the same stress-ng benchmark set as before, but this time comparing the EC2 M5d.metal and EC2 M6g. The results are more inline with what we would expect from running a microbenchmark set against such different architectures: The Arm-based instance performs better, and sometimes much better in some tests, while the x86-based instance performs better in others.

Test EC2 M6g EC2 M5d.metal Difference
cache 218 311 -29.90%
icache 45887 34368 33.52%
matrix 453982 252190 80.02%
cpu 14694 24077 -38.97%
memcpy 134711 111877 20.53%
qsort 943 728 29.53%
dentry 3088242 2770985 11.45%
timer 55515663 15367075 261.26%

Table 2: Result of the stress command: stress-ng --metrics-brief --cache 16 --icache 16 --matrix 16 --cpu 16 --memcpy 16 --qsort 16 --dentry 16 --timer 16 -t 1m

Figure2 : EC2 M6g vs EC2 A1. The M6g class is 5 times faster than A1 for running reads in the ScyllaDB NoSQL database, in the same workload presented in Figure 1.

Figure3: EC2 M6g vs x86-based M5, both of the same size. The performance of the Arm-based server is comparable to the x86 instance. With AWS claiming that prices will be 20% lower than x86, economic forces will push M6g ahead.

Figure4: CPU utilization during the read benchmark, for 14 CPUs. They are all operating at capacity. Shown in the picture is the data for M6g, but all 3 platforms achieve the same thing. ScyllaDB uses two virtual CPUs for interrupt delivery, which are not shown, summing up to 16.

For database workloads the biggest change comes with the announcement of the new M6gd instance family. Just like what you get with the M5 and M5d x86-based families, the M6gd features fast local NVMe to serve demanding data-driven applications.

We took them for a spin as well using IOTune, a utility distributed with ScyllaDB that is used to benchmark the storage system for database tuning once the database is installed.

We compared storage for each of the instances, in both cases using 2 NVMe cards set up in a RAID0 array:

M5d.metal

Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 1517 MB/s
Measuring sequential read bandwidth: 3525 MB/s
Measuring random write IOPS: 381329 IOPS
Measuring random read IOPS: 765004 IOPS

M6gd.metal

Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 2027 MB/s
Measuring sequential read bandwidth: 5753 MB/s
Measuring random write IOPS: 393617 IOPS
Measuring random read IOPS: 908742 IOPS

M6gd.metal M5d.metal Difference
Write bandwidth (MB/s) 2027 1517 +33.62%
Read bandwidth (MB/s) 5753 3525 +63.21%
Write IOPS 393617 381329 +3.22%
Read IOPS 908742 765004 +18.79%

Table 3: Result of IOTune utility testing

M6gd NVMe cards are, surprisingly, even faster than the ones provided by the M5d.metal. This is likely in virtue of them being newer, but clearly shows that certainly there are no penalties posed by the new architecture.

Summary

Much has been said for years about the rise of Arm-based processors in the server market, but so far we still live in an x86-dominated world. However, key dynamics of the industry are changing: with the rise of cloud-native applications hardware selection is now the domain of the cloud provider, not of the individual organization.

AWS, the biggest of the existing cloud providers released an Arm-based offering in 2018 and now in 2019 catapults that offering to a world-class spot. With results comparable to x86-based instances and AWS’s sure ability to offer a lower price due to well known attributes of the Arm-based servers like power efficiency, we consider the new M6g instances to be a game changer in a red-hot market ripe for change.

Editor’s Note: The microbenchmarks in this article have been updated to reflect the fact that running a single instance of stress-ng would skew the results in favor of the x86 platforms, since in SMT architectures a single thread may not be enough to use all resources available in the physical core. Thanks to our readers for bringing this to our attention.

About Vladislav Zolotarov

Vlad specializes in networking, mostly L2. He has worked at on projects for Mellanox, the bnx2x Linux device driver for Broadcom, and on the ScaleMP Virtual Device System for network interfaces. Vlad studied at the Israel Institute of Technology (Technion) and holds a B.Sc. in Computer Science.

About Glauber Costa

Glauber Costa is a staff engineer at DataDog. Previously he was VP of Field Engineering at ScyllaDB. Before ScyllaDB, Glauber worked with Virtualization in the Linux Kernel for 10 years, with contributions ranging from the Xen Hypervisor to all sorts of guest functionality and containers.