Arm's latest architecture, v9, is designed to meet the needs of modern computing, from the IoT up to supercomputers.
Arm has launched a major architecture revision, Arm v9, which provides additional features for security, confidential computing and AI as well as boosting overall performance. Arm said it expects v9 to deliver more than 30% performance uplift in the next two mobile and infrastructure generations. AI features, thus far most typically available with GPUs, will be available across the company’s GPUs, CPUs, and NPUs.
The previous architecture from Arm, v8, was launched a decade ago, and Arm expects v9 to dominate computing silicon from IoT to supercomputing applications for the next 10 years.
“Even I still marvel at how pervasive our technology has become,” said Arm CEO Simon Segars. “In a years’ time, our partners will have shipped a cumulative total of 200 billion chips. Putting that in context, half of that number, the first 100 billion took 26 years to reach the market. If our prediction is correct, the second 100 billion chips will have shipped in just five years. Our objective is to allow the broadest set of developers to write fast, run fast on Arm.”
Arm stressed that v9 will be a decade-long project, with v9.1, v9.2 and so on expecting to roll out on an annual cadence from here on in. The key features announced with this initial launch largely concern two areas: Arm’s response to the global demand for ubiquitous specialized computing, and its efforts to increase security for every application.
“Arm v9 is a rolling program of substantial enhancements to the architecture that we’ll be deploying over the next few years, increasing the computing capability in the widely applicable areas of digital signal processing and machine learning, and improving the security and robustness of our systems,” said Richard Grisenthwaite, SVP, chief architect and Arm Fellow.
Peter Greenhalgh, Arm Fellow and VP of Technology, described the challenges of developing processor IP for tomorrow’s computers, including increasingly complex, constantly evolving, heterogeneous workloads in the mobile, auto and infrastructure markets. Advanced process nodes can pick up some of the slack, but are costly and increase production timeframes.
“There’s a requirement that new chips must provide excellent ROI on traditional compute workloads that people care about today, and on the future workloads that people will care about tomorrow,” he said. “Given the high cost of failure for tapeout both in absolute cost terms and also in market window impact, there’s also a requirement to be using proven, high quality IP. In the decade of Arm v9, we are going to deliver the technology that enables the performance and quality that the market needs.”
The move to Arm v9 is expected to deliver more than 30% performance uplift in the next two mobile and infrastructure generations. Arm is working on technologies to maximize frequency, bandwidth, cache size, and reduce memory latency to extract the maximum performance of the CPU.
Greenhalgh added that while there’s some debate about the merits of specialized accelerators, video processors and AI/ML accelerators are “here to stay.” However, the demands of today’s commercial workloads mean accelerators must be programmable – this includes everything from libraries and C compilation to virtualization so that they can be easily used in a cloud environment, all the way through to debug and performance analysis. Add in requirements for security, and suddenly your accelerator design has grown to become more CPU-like, he said.
“From this perspective, our belief is that we should continue to extend the CPU architecture so that our CPUs can accelerate even more workloads and do so in a way that is programmable, protected, pervasive, and proven,” Greenhalgh said. “Today it’s impossible to ignore how fragmented some of the AI and DSP workloads are in the mobile market and how they can benefit from being coalesced onto a CPU environment. This is where we want to push our architecture and compute designs.”
Arm v9 will introduce a number of new features dedicated to AI, including increased hardware support for AI across its entire portfolio of CPUs, GPUs and NPUs. This is based on Arm’s belief that all processors will need to handle AI workloads, from supercomputing to the cloud to the endpoint device.
“We believe that purpose-built system design will be the key to innovation in all forms of computing,” said Grisenthwaite. “Different computing problems need different mixes of computing components. Many IoT devices need to interpret their world, and a combination of the M profile cores with the Ethos-U55 microNPU is perfect there. In automotive systems, partners will increasingly be combining many large and small CPUs with GPUs, NPUs and their own IP to generate the right computing solution for those autonomous systems.”
Jem Davies, Arm Fellow and VP and GM of the company’s Machine Learning Group, described how these different mixes of computing components might work in a VR headset (big NPU and GPU alongside little NPUs and CPUs), smartphones (big CPU and GPU alongside little CPUs and NPUs), and IoT devices (little CPU and NPU).
“For these three use cases, you’d ideally build three different systems on chip with three very different types and sizes of processors,” Davies said. “Get the balance wrong, and you have a chip that’s too slow or costs too much because you invested in processing you don’t need, or one that uses the wrong processor for the workload and kills your battery or green energy rating… when choosing hardware for AI, we absolutely see that one size does not fit all. Choices that are right for one partner, one device or one use case will simply not apply elsewhere.”
Arm v8 introduced support for FP16 and BFloat arithmetic which are popular in AI processing, as well as a feature called scalable vector extension (SVE). SVE was developed in collaboration with Fujitsu and others for the Fugaku supercomputer processors; it adds vector processing capabilities to improve AI and DSP performance.
“[SVE] was designed in a scalable manner so that the concepts used for supercomputers can be applied across a far wider range of products,” said Grisenthwaite. “We’ve added increased functionality to create SVE2, the enhanced scalable vector extensions, to work well for 5G systems and many other use cases such as virtual and augmented reality, and also for machine learning within the CPU. Over the next few years, we’ll be extending this further with substantial enhancements in performing matrix-based calculations within the CPU.”
The other big focus for Arm v9 is security. Specifically, a feature called Realms has been developed in collaboration with Microsoft over the last five years in order to enable confidential computing as part of the Arm Confidential Computing Architecture (Arm CCA). Arm CCA builds on the secure and non-secure worlds in today’s TrustZone.
“Today, the traditional model of computing places a tremendous amount of trust on the operating systems and hypervisors that applications are run on,” Grisenthwaite said. “Confidential computing removes the assumption that the privileged software that’s responsible for running the computing system needs to be able to see or manipulate the data of those running sessions. That removal will make it far easier to trust the computing infrastructure.”
Realms will enable the running of applications or services in a way that data is protected from inspection or intrusion by the host or any other software running on that host. This can be applied to virtual machines in the cloud, but equally to different apps on a smartphone.
“Typically when you lease capacity from a hyperscale provider, you get a [virtual machine] hosted on a multi-tenant system where you get some share of the common address space,” explained Mark Hambleton, VP open source software at Arm. “In the Arm CCA world, one of the simplest changes will be that instead of the VM being hosted from this common address space, it’ll be hosted in a realm where the address space is protected from other VMs sharing the system. The same is true on a laptop or PC when you have a second operating system sharing the host resources.”
Hambleton said that a typical Android system today will have a mix of non-secure software that runs the core stack, some secure services running under TrustZone, and maybe some digital rights management (DRM) services running as virtual machines alongside Android. One possibility is that some of the secure services could migrate from TrustZone into their own realm, enabling a more dynamic environment for that service. The DRM service could also move into its own realm, giving it a boost in confidentiality as its data is now protected from the core Android stack.
Mixed-criticality applications such as robotics and automotive may also use Realms to separate services that feed safety critical systems, protecting their memory from interference, Hambleton said.
Realms will not be available immediately, but will be part of a future revision of Arm v9. Before that happens, another new security feature, memory tagging extensions (MTE) will become available.
MTE, developed in collaboration with Google, can be used to find both spatial and temporal memory issues in software.
“A depressing reality is the root cause of many of these [security issues] really come back to the same old memory safety issues that have been plaguing computing for the last 50 years,” said Grisenthwaite. “Two particularly common memory safety problems, buffer overflow and use after free, seem to be incredibly persistent over the years. And a huge part of the problem is that they are frequently present in software for years before they are discovered and exploited.”
MTE will allow software to associate a pointer to memory with a tag, and check that this tag is correct upon use of the pointer. If the access is out of range, or if the use of the memory has moved on then the tag check will fail.
MTE is one of the first features to be launched in Arm v9 and will be available in the first generation of Arm CPUs. Software support for MTE will be introduced in Android 11 and OpenSUSE.
Another topic discussed at Arm’s launch event was standardization, specifically the balance between too much standardization, which means Arm customers cannot develop differentiated solutions, and too little, which decreases software compatibility.
Arm already has a successful program for servers called server-based system architecture (SBSA) with its validation program, Server Ready, which encourages what Arm sees as the right balance of standardization. As part of Arm v9, the scope of this program will be broadened to include edge and endpoint devices, under a program called System Ready.
“Following on from the success of Server Ready, we started to look at how we could reach the end goal of any operating system can run on any ARM-based hardware, at least in the A-class,” Hambleton said. “Sometimes we find that standards struggle to catch up to the breadth and diversity of the Arm ecosystem, which is why System Ready has been designed from the ground up to support all the needs of the Arm ecosystem, from the smallest to the largest device.”
MediaTek’s CTO Kevin Jou appeared briefly in Arm’s presentation to say that MediaTek’s first smartphone product with an Arm v9 CPU will be commercially available by the end of this year. Most Arm partners will be looking to produce Arm v9-based samples in around the same time frame, with first v9 production silicon beginning to appear in 2022.
This article was originally published on EE Times.
Sally Ward-Foxton covers AI technology and related issues for EETimes.com and all aspects of the European industry for EETimes Europe magazine. Sally has spent more than 15 years writing about the electronics industry from London, UK. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more. She holds a Masters’ degree in Electrical and Electronic Engineering from the University of Cambridge.