With its Xe GPUs, Intel is now officially a maker of discrete graphics processors. Does it have what it takes to lead in the category?
Having released two Xe GPUs, Intel is now officially a maker of discrete graphics processors. There is a big difference between being a participant and being a leader and with its multifaceted graphics strategy that spans from laptops for casual gamers to high-end gaming desktops and from entry-level Android games to supercomputers, Intel certainly wants to become a leader. But does it have what it takes?
Historically, computer graphics hardware was used for two types of applications: gaming and professional visualization. Gaming hardware spanned from arcade machines to consoles to PCs. ProViz hardware was used for computer aided design (CAD), digital content creation (DCC), medical imaging, and various visual simulations. In the late 2000s, GPUs began getting adopted for various high-performance computing (HPC) applications.
Today, there are a number of new emerging applications that are not exactly traditional gaming, ProViz, or HPC. Artificial intelligence (AI) and deep learning / machine learning (DL/ML) applications can take advantage of highly-parallel GPUs, yet they need support for data formats that are not supported by gaming or ProViz hardware. High-quality AR/VR simulations require a lot of graphics horsepower that is provided by top-of-the-range GPUs, yet they are not meant to use ProViz hardware. Cloud gaming as well as server-side game rendering applications are meant to enable games, yet using off-the-shelf graphics cards is not optimal for data centers. Finally, there are GPU-accelerated applications that take advantage of GPUs’ explicitly parallel nature, yet these do not belong to the HPC world as they are designed for consumers/prosumers.
When Intel hired GPU veteran Raja Koduri to establish its Core and Visual Computing Group at Intel to develop discrete graphics processors back in late 2017, it created more questions than answers given Intel’s history with its i740 graphics adapter in the late 1990s and then Larrabee gaming and HPC GPU in the late 2010s. To be a successful GPU supplier today, Intel needed to develop a family of graphics processors for traditional, HPC, and emerging applications, a task of a rather extreme complexity.
In the recent weeks and months Intel finally introduced its first discrete GPU in two decades, and disclosed some additional details about its graphics strategy. In this article we are going to examine Intel’s plans and will talk to experts about Intel’s potential prospects.
It’s all about architecture
When Intel rejuvenated its discrete graphics efforts three years ago, the company made it clear that the move is a part of its strategic effort to better address artificial intelligence, graphics (gaming, simulations, ProViz, AR/VR, etc.), machine learning, and other performance-demanding workloads for the client, server, data center, and HPC segments. Intel’s previous attempt to build a multi-core processor — codenamed Larrabee — that could serve both graphics and HPC workloads largely failed because Intel tried to bring x86 to the worlds that, perhaps, never needed it.
With Koduri at the helm, Intel started developing its Xe, an explicitly parallel architecture that could adapt to different workloads and scale from TFLOPS to ExaFLOPS in terms of performance and features. Having spent decades at various GPU companies, Koduri understands like no other that no single architecture fits all, and why both AMD and Nvidia develop special versions of their graphics processors for data centers and supercomputers these days. Therefore, adaptability is one of the key features of Intel’s Xe GPU architecture.
So far, Intel has introduced its Xe-LP architecture for integrated and low-power discrete GPUs and in the coming year the company plans to bring three more Xe GPU architectures for various workloads to the market. Obviously, Intel intends to update its graphics families regularly.
To ensure that all the capabilities and instructions that its Xe GPUs (as well as AI accelerators, CPUs, and FPGAs) could be easily used by software developers, Intel introduced its oneAPI application programming interface as well as Level Zero direct-to-metal interface (for AI, GPU, and FPGA products only). Meanwhile, software stack is out of the scope of this article.
Xe-LP: Let’s start with something simple
While Intel has been absent from the market of standalone graphics processors for two decades, the company kept developing its integrated graphics processors aimed at inexpensive and/or low-power client PCs. That architecture — sometimes called Gen — has been good-enough for its tasks, but since Intel is now looking at a wider range of applications these days, it needed an all-new architecture.
The entry-level microarchitecture in Intel’s Xe stack is called Xe-LP and it will be used for integrated GPUs as well as inexpensive discrete GPUs.
On a high level, Intel’s Xe-LP is a DirectX feature level 12_1 design, just like the company’s Gen 11 launched last year albeit with some extra features. Meanwhile, Xe-LP brought tangible performance benefits from the very start because of the new architecture and increased frequency potential (up to 1.70 GHz vs 1.1 GHz in case of the previous-generation) enabled by Intel’s new 10nm SuperFin process node.
The first product to use Intel’s Xe-LP iGPU is the company’s 11th Generation Core Tiger Lake processor for notebooks and compact desktops. The highest-end version of the Xe-LP iGPU features 96 execution units (EUs) capable of 1536 FP16 FLOPS/clock; a 48 texels/clock texture engine; a 24 pixels/clock raster engine; and a revamped memory subsystem with a new L1 data cache, a 16 MB L3 cache, end-to-end compression, and a new ringbus interconnect that enables a 2X higher bandwidth (versus previous generation).
The most important architectural change of the Xe-LP architecture when compared to Intel’s previous-generation architectures are all-new execution units (EUs) that now integrate an 8-wide FP/INT ALU and a 2-wide extended math ALU. The 8-wide FP/INT ALU is capable of 1 FP32/INT32 ops/clock, 2 FP16/INT16 ops/clock, and 4 INT8 ops/clock to better handle AI workloads that use various datatypes. Meanwhile, to save some die space, two EUs now share thread control.
Assuming that all FP/INT ALUs are busy and are operating at 1.35 GHz, Tiger Lake’s high-end Xe-LP iGPU offers ~2.1 TFLOPS of FP32 compute performance for graphics, which is nearly two times higher when compared to Intel’s previous-generation Iris Plus G7 graphics (1.12 TFLOPS). Depending on CPU model, Intel scales its Xe-LP iGPU, so there are cheaper versions with a reduced number of EUs and other units as well as less impressive performance.
The new Xe-LP GPUs also come with Intel’s latest display and media engines. The latest media engine features a 12-bit end-to-end video pipeline for playing back videos in formats like BT.2020 along with hardware-accelerated decoding of the latest codecs, such as AV1. The media engine is important not only for integrated graphics as these GPUs are used mostly for productivity and media consumption, but also for various video streaming applications.
The display engine supports four pipelines as well as eDP, DisplayPort 1.2, HDMI 2.0, and Thunderbolt 4/USB4 Type-C outputs, which is good enough for today’s integrated and entry-level GPUs.
Intel positions its Xe-LP integrated and discrete GPUs both for gamers and for creators. Therefore, the company makes a lot of efforts to optimize its drivers for best performance and quality as far as games are concerned. For example, Xe-LP drivers support hardware/software scheduling codesign for added flexibility in DirectX 11-based games. Also, the GPUs continue to support variable rate shading (VRS) so not to spend too much compute horsepower shading parts of a scene that are considered not important. VRS has to be implemented by game developers, so its gains vary from title to tile. Meanwhile, Intel is also adding game sharpening (GS) technique that boosts image clarity in games without increasing resolution, which saves bandwidth, but probably puts additional load on other parts of the GPU. GS is controllable by end-users.
According to Jon Peddie Research, almost 70% of PCs rely on Intel’s integrated graphics. Therefore, Intel’s transition of integrated GPUs to its new Xe-LP architecture has one important effect: game developers that address gamers with built-in GPUs will have to adjust their software (engines, middleware, games) for Intel’s latest architecture. Furthermore, Intel can tailor its drivers to ensure competitive performance and lack of visual artefacts before it ships discrete Xe products aimed at demanding gamers later on.
“Xe-LP was always the key first step to our strategy,” said Bruce Fienberg, an Intel spokesman. “This is the energy-efficient foundation on which we scale the architecture for the rest of the family and it will help us deliver our first discrete GPU in over two decades. Xe-LP addresses a large market with hundreds of millions new integrated graphics users every year, it will power the visual experience for most people worldwide.”
It is noteworthy that Intel’s Xe-LP is the most power and area-optimize microarchitecture of all Xe microarchitectures. It will certainly be used for PCs and various detachable tablets, but it is not supposed be scaled to address things like MobileEye products, according to Intel.
Iris Xe Max ‘DG1’: The first Intel discrete GPU in decades
Intel’s first commercial discrete graphics processor in two decades is called the Iris Xe Max (previously it was known as the codenamed DG1). The GPU is based on the Xe-LP microarchitecture and has the same configuration as the highest-end Tiger Lake integrated GPU.
The graphics processor packs 96 EUs, a 48 texels/clock texture engine, a 24 pixels/clock raster engine, a 128-bit memory controller supporting up to 4 GB of LPDDR4X, and a PCIe 4.0 x4 interface to connect to a laptop CPU that does not have too many spare PCIe lanes. The Intel Iris Xe Maxis produced using Intel’s 10nm SuperFin process technology, the same node that is used for Intel’s Tiger Lake CPUs.
Since the Iris Xe Max ‘DG1’ GPU is a discrete part designed for notebooks, it is clocked at 1650 MHz and therefore provides up to 2.46 FP32 TFLOPS of performance. While it has the same configuration as Intel’s highest-end integrated GPU (albeit at a 22% higher frequency), its performance will likely be noticeably higher since it has a higher thermal design (TDP) envelope as well as its own 128-bit memory interface.
Unconstrained by Tiger Lake’s TDP and with a total memory bandwidth of 68 GB/s, the Iris Xe Max provides great performance for full-HD gaming as well as additive AI workloads, according to Intel. Apparently, the chip giant wants its DG1 to address not only entry-level gaming (which is also addressed by its built-in GPUs), but also consumer-grade compute-intensive applications (such as Topaz Labs’s Gigapixel AI).
From games performance perspective, the main competitors for Intel’s Iris Xe Max GPU are AMD’s Radeon RX 560 (a 2016 GPU), Nvidia’s GeForce GTX 1050 Ti (a 2017 GPU) or its lower-end MX350 counterpart, but not even Nvidia’s Turing based GeForce MX450. At a 1080p resolution, 2.46 FP32 TFLOPS is indeed enough for many games, but is not enough for many modern games and/or higher resolutions, which is why gamers tend to get more capable and expensive discrete GPUs. Also, 2.46 FP32 TFLOPS is not enough for real-time business visualizations as they must look extremely engaging. Obviously, DG1 was not designed for such applications in mind, but it means that developers of such software will not touch Intel’s GPUs for at least another year.
“For developers, games and business visualizations represent two completely different paradigms,” said Yaroslav Lyssenko, CEO of Limestone Simulations. “Games are developed for hardware that the audience has at hands. For business VR simulations you need the best hardware possible to make them look as real as possible.”
While the Iris Xe Max is not going to be used for workloads that need much more graphics horsepower than it can provide, it will be used to accelerate a variety of content creation applications courtesy of Intel’s Deep Link and Additive Ai technologies. In a nutshell, Deep Link is a software and firmware stack that balances workload and TDP between Intel’s Tiger Lake’s integrated GPU and Iris Xe Max discrete GPU, whereas Additive Ai enables compute resources of the iGPU and the dGPU to be used together within an application. Today, Deep Link and Additive Ai are supported by programs like HandBrake, Topaz Gigapixel AI, and XSplit, but Intel says that eventually its acceleration-technologies will be supported by Blender as well as various applications by CyberLink, and Magix.
While some PC makers might attempt to build laptops with a CPU featuring a low-end iGPU as well as Iris Xe Max dGPU in a bid to offer an ‘all-Intel’ product for gamers, it is hard to expect that Intel’s Intel’s discrete graphics processor will cannibalize sales of the company’s CPUs with higher-end iGPU configurations. Meanwhile, Intel’s Iris Xe Max does not provide radically higher performance and Intel’s advertising focus will remain on CPUs (to a large degree because it positions its DG1 as a co-processor). Nonetheless, Jon Peddie, the head of Jon Peddie Research, believes that the Iris Xe Max will find its place on the market.
“DG1 will find a socket and OEMs will create a new SKU for it,” said Peddie. “Intel’s brand is so powerful OEMs won’t be able to turn their back on it. Nvidia’s GeForce MX350 is old generation so [Nvidia has added its Turing-based MX450 to the lineup] in anticipation of Intel’s push. We need to see what the costs are for that possible 30% speed increase in watts and dollars. Also based on the photos of it, it doesn’t look very small.”
Intel’s SG1: An entry-level GPU goes to datacenters
In the PC space, Intel DG1’s 2.46 TFLOPS of graphics horsepower is not a breakthrough and is only enough for 1080p gaming. But there are Android games designed for widespread entry-level and midrange smartphones that use both local and cloud resources to provide decent graphics quality and fine experience. Furthermore, there are white-label cloud gaming service providers that offer access to non-demanding games.
These are two rapidly growing markets and Intel believes that it can address them with its Server GPU that is based on the Xe-LP discrete GPU silicon with a 23 W TDP. In the 5G era, game streaming will become more pervasive than today, so it is strategically important for Intel to address this growing market.
“I can envision a tiered offering by the streamers,” said Jon Peddie. “The company has two design wins in Chinese streaming giants Tencent and Unitus. Tencent is the first customer for the XG310. Tencent’s Deputy GM of Tencent Game Matrix, Allen Fang, said cloud gaming is a potentially high growth area in the 5G era, which is strategic for [Intel].
From technology standpoint, Android game streaming requires a lot more than just fast graphics processing and video transcoding capabilities. Servers running such workloads need to ensure low latency and consistent performance, which means a software stack consisting of special graphics drivers, tailored virtual machines, and some additional optimizations. Intel has designed this stack internally, which is a testament that the company is very serious about game streaming and server-side rendering applications.
Intel has developed reference design for a graphics card running four Server GPUs equipped with 8 GB of LPDDR4X memory per chip (32 GB of RAM in total) and it believes that such a card enables Android cloud gaming providers to easily scale their graphics performance without increasing the number of servers. One of the first companies to offer Intel’s quad-GPU solution will be H3C with its XG310 board.
Intel says that Server GPU can handle up to 20 game instances, depending on the exact title, resolution, and other factors, obviously. Therefore, one quad-GPU XG310 card can handle up to 80 users, whereas two quad-GPU XG310 boards can support up to 160 game instances using only 184 W of power for graphics.
Initially, H3C’s quad-GPU XG310 board will be used by Tencent Games. The Chinese gaming giant needs cloud rendering in a bid to ensure that all of its games work and look fine on all smartphones, even the cheapest models. In fact, entry level and even midrange handsets sometimes come with SoCs that do not have sufficient graphics horsepower to handle games on their high-resolution displays.
“When we develop a game, we usually have a so-called target device on which the game looks good and, which is even more importantly, works well,” said Aleksei Shulga, a producer at Creative Mobile. “Everything that is better gets certain bonuses, everything that is worse either gets a lower-quality picture, or its support is dropped based on the version of Android. Keeping in mind how many devices are in the wild, nobody can guarantee fine performance and experience on all of them. For example, there are smartphones from Chinese brands that with large high-resolution displays that are powered by low-end SoCs which cannot realistically enable sufficient performance of games on such displays.”
In addition, the H3C XG310 will be used by Gamestream as well as Ubitus white-label cloud gaming services. Eventually, this or similar cards will likely be adopted by Intel’s traditional clients among video streaming services.
It remains to be seen whether Intel’s SG1 project will be a big business for the company, but at first glance it looks quite promising. But perhaps there is a more important aspect to consider here. The software ground work that Intel has done for its SG1 quad-GPU card for servers will have a great value for its upcoming Xe-HP GPUs for datacenters. Essentially, Intel has developed and commercialized a huge part of Xe-HP’s required software stack several quarters ahead of its release.
Scale everything up: Xe-HP for datacenters
As noted above, Intel’s Xe architecture is all about across-the-board scalability both in terms of performance and in terms of features. Intel’s second variant of the Xe architecture is called the Xe-HP and is designed specifically for datacenters.
For datacenters, Intel had to redesign its GPU almost from the ground up, but still retain Xe’s general principles. The Xe-HP uses revamped EUs that support new math and new floating-point formats as well as instructions, the GPU also has IPC improvements (which might mean a new front end), it also has new internal fabrics to provide bandwidth necessary for the new EUs, and it takes advantage of frequency optimizations brought by Intel’s 10nm Enhanced SuperFin process technology. The new Xe-HP GPUs support FP64, bfloat16 format for AI/ML computing, the DP4A convolution instruction for deep learning as well as Intel’s new XMX instructions.
“The Xe-HP was the first big leap from the Xe-LP,” said Koduri. “There were a lot of things just we had to do to hit the datacenter scale. Step one was to scale everything from the humble LP. EU counts needed to go from double digit quantities to quad digits. So, we did a scale of 100x, frequency needed a big boost anywhere from 1.5X to two 2X boost from the Xe-LP base. Memory bandwidth needed a 10X from integrated graphics levels. We had to scale our internal fabrics up to need these levels. We also needed to add support for several new math formats that are not typically prioritized for integrated graphics, that present floating point and AI formats in particular. [Also,] we needed to increase the IPC for several formats by 10X in many cases.”
Intel does not announce specifications for its Xe-HP GPUs just yet, but confirms that these processors will feature ‘quad digit quantities’ of EUs running at 1.5 – 2 times higher frequencies when compared to speeds offered by Xe-LP, which is pretty vague, bust still gives some basic idea about what to expect. The Iris Xe integrated GPU runs at up to 1350 MHz (whereas the Iris Iris Xe Max discrete GPU operates at up to 1650 MHz), so the Xe-HP GPUs will run at least 2.0 GHz, but might end up at something like 2.50 GHz and above.
This is quite high by today’s datacenter GPU standards. Nvidia’s A100 runs at around 1.40 GHz, whereas peak frequency of AMD’s Instinct MI100 is 1.50 GHz. Essentially, Intel promises that is Xe-HP GPUs will run at clocks common for multi-core CPUs, but not for contemporary datacenter GPUs, which demonstrates rather unprecedented capabilities of Intel’s 10nm SuperFin Enhanced technology when compared to TSMC’s N7 and Samsung Foundry’s 8N nodes. To provide enough memory bandwidth for their EUs, Xe-HP GPUs will use HBM-type memory connected using EMIB packaging.
Early Xe-HP silicon running with early drivers could achieve a whopping 40 TFLOPS FP32 throughput, according to Intel. To put the number into context, Nvidia’s A100 can hit 19.5 FP32 TFLOPS, whereas AMD’s Instinct MI100 is rated at 23.1 FP32 TFLOPS.
Intel says that today’s datacenter GPU market requires different parts with TDPs ranging from 75 W to 500 W. As a result, the company will offer at least three GPUs based on the Xe-HP microarchitecture with one Xe-HP tile, two Xe-HP tiles, and four Xe-HP tiles. Intel does not reveal much about further differentiation of these parts in terms of performance, but it is something that makes sense to expect.
“The scale, everything was the hardest part and a GPU architecture team was up to the task, but that wasn’t enough,” said Koduri. We had to scale even more. The data center GPU market is growing fast, today’s solutions sprint from 75 W all the way up to 500 W in various form-factors, we had one design team and we had to come up with the architecture that enabled us to offer a range of solutions to our customers. Our advanced packaging team jumped in here and helped us.”
Given how popular video streaming services are today, Intel needed to ensure that its Xe-HP GPUs had top-notch media capabilities to land customers with heavy media streaming requirements (which already use Intel’s accelerators). According to Intel, a single tile of the He-XP architecture can transcode 10 high-quality 4K streams at 60 frames per second simultaneously, so some of its Xe-HP GPUs will have media performance currently expected from ‘a rack’, the company claims.
“The big area of focus and differentiation for us was the media,” said Koduri. “We set a lofty target for the media, well above and beyond anything that is out there for extreme density and visual quality and far beyond what our customers thought anyone could do. We really wanted to bring a rack level media performance down to a package. More importantly, we wanted to do that at a quality level, close to the best offline coders that are out there. We also wanted to support all the existing media software like FFmpeg, The Streamer, Handbrake, CyberLink, and Adobe easy, and also enable their high quality and coding paths, which are typically only enabled for CPU. This meant we had to lean in heavily on programmability for media.”
Intel received the first Xe-HP silicon from the fab around mid-2020 and has been playing with it for several months now. The company has hit multiple bring up-related milestones already and allowed select customers to access the Xe-HP remotely. Today, the company even provides Xe-HP hardware to select customers. Intel expects to make its Xe-HP GPUs available commercially sometimes in 2021.
“These are all pretty lofty goals, and I’m super happy to tell you that we’re able to see all of this scalability in action recently,” said Koduri. “We have had Xe-HP silicon back in labs for several weeks now, and it had a successful power on This is a GPU bring up that meant any of us will remember as the first of many.”
Enthusiast gamers, rejoice: Xe-HPG with hardware ray-tracing
Intel’s Iris Xe Max discrete GPU is an entry-level product not made to compete against higher-end graphics cards from AMD and NVIDIA, but from the day Raja Koduri joined Intel, the gaming crowd expected the company to compete against the discrete GPU suppliers in the gaming space. Earlier this year the company disclosed that next year it is going to offer a yet another Xe architecture variant, the Xe-HPG, optimized specifically for mid-range and enthusiast-class gaming graphics cards.
“We know at Intel that gamers are the hardest bunch to impress,” said Koduri. “They want products that have the best performance, best performance per watt, best erformance per dollar, and the latest and greatest features. All at the same time. We had to leverage the best aspects of the three designs we had in progress to build a gaming optimized GPU.”
The Xe-HPG GPUs will continue to use energy-efficient blocks from the Xe-LP, but will add hardware-accelerated ray-tracing support, which will have an effect on architecture of EUs and/or sub-slices. The graphics processors will also leverage scalability (internal interconnects) designed for the Xe-HP as well as frequency optimizations from the Xe-HPC microarchitecture. Intel’s Xe-HPG GPUs will use GDDR6-based memory subsystems to cut costs associated with HBM-enabled Xe-HP parts.
By the time Intel’s Xe-HPG parts will be available, game developers and its own driver team will have enough time to optimize their software for the Xe architecture, hence the company expects its GPUs to offer high performance, fine image quality, stable drivers, and other things expected from gaming graphics cards today. From this point of view, it makes a great sense for Intel to launch inexpensive Xe-LP GPUs first.
Intel has already received the first Xe-HPG silicon back from its foundry partner and is currently testing it in its labs. Meanwhile, it is unclear whether it has started to sample it among game developers. This part is important as developers need to tailor their titles for Intel’s Xe-HPG to take advantage of its capabilities and ensure good performance and flawless operation. Good news is that Intel traditionally has a great relationship with the game development community as its CPUs power over 70% of PCs used for gaming.
In addition to games, Intel’s Xe-HPG could be able to address professional and business applications, such as ProViz and AR/VR simulations that are getting more widespread due to pandemic. Intel makes no comments about ProViz and yet has to talk publicly about AR/VR support by its Xe-HPG hardware and software. Meanwhile, addressing ProViz market is important because it is a lucrative business, whereas AR/VR is a growing market. It should be noted that the aforementioned applications need high performance and quality drivers (with certifications by ProViz ISVs), so Intel would naturally like to keep any information about Xe-HPG and markets that it is going to address low profile for competitive reasons at the moment.
“VR games are very demanding for performance,” said Shulga. “Remember that we need to render each scene twice at a constant 90 frames per second, that’s 11.1 ms per frame.”
“VR interiors benefit greatly from raytracing, which requires GPU horsepower,” said Lyssenko. “For stable 90 FPS, we need the best hardware we can get. For numerous projects, [GeForce RTX] 2080 Ti is an absolute minimum.”
Given how capable Intel’s Xe-HP architecture is in terms of FP32 performance, prospects of Xe-HPG also look rather good, especially considering that it is not tied to Intel’s own nodes.
Intel says that that the first Xe-HPG GPUs will hit the market in 2021 and will be made externally to take advantage of leading-edge process technologies and use external IP libraries optimized for foundries. Intel believes that such approach will be instrumental to make its Xe-HPG GPUs to be competitive against offerings from AMD and NVIDIA both in terms of costs and performance. Meanwhile, Intel may have an immediate success on the discrete GPU market due to its brand.
“When they introduced the i740, a hundred AIB vendors signed up an put out a product,” said Jon Peddie. “Why? Because the Intel brand is so powerful and everybody wants a piece of it. It is still more powerful than that of Nvidia and AMD.”
What remains to be seen is whether companies working exclusively with AMD and Nvidia on the graphics cards front yet supplying motherboards for Intel processors will also adopt Intel’s Xe-HPG GPUs. If they do, this might lead to major disruptions on the market of graphics adapters in general.
Xe-HPC ‘Ponte Vecchio’: the supercomputing pinnacle of the project
Intel’s Xe-HPC microarchitecture and the codenamed Ponte Vecchio GPU represents the culmination of Intel’s graphics, packaging, process nodes, and even memory technologies.
Intel’s Ponte Vecchio GPU is a multi-tile package consisting of a base tile (made using the company’s 10nm SuperFin process), a compute tile (fabbed internally and externally), a Rambo cache tile (produced using Intel’s 10nm Enhanced SuperfFin node), and an externally-fabricated Xe-Link (CXL-based) tile to connect to other GPUs. Aimed at supercomputers, Intel’s ‘Exascale GPU’ features an enormous complexity, which is why the company had to use four tiles and external manufacturing partners to build it.
Intel yet has to disclose more details about its Ponte Vecchio, but from what we know today this part will use the most advanced version of the Xe architecture with enhancements specific to supercomputers. Meanwhile, since Intel has to make Ponte Vecchio’s compute tile using its own next-generation (7nm) process technology and externally, it is evident that this part is too complex to be made even using Intel’s 10nm Enhanced SuperFin node.
Being a vertically integrated company, Intel is naturally inclined to work alone on everything from a thin client PC powered by the cloud all the way to multi-tile HPC or supercomputer chip that costs hundreds of millions to design and is extremely hard to build. Intel’s graphics strategy largely reflects its approach to manufacturing in general, so the company will work with partners to build everything it needs using process technologies that make the most sense.
Intel’s previous graphics and HPC strategy focused around the Larrabee project included multiple high-end processors for supercomputers that could be used by gamers assuming that they provided competitive performance. By 2009, Intel realized that its GPU design was not competitive enough and instead of fixing it, it changed the focus of the whole project to supercomputers. Even before it turned out that Intel’s 10nm node had major problems, the company had cancelled its 10nm codenamed Knights Hill CPU, an indicator that the semiconductor giant was not satisfied with its Xeon Phi processor in general.
With its Xe-centered graphics and HPC strategy everything looks different. Firstly, Intel no longer considers graphics as a second-class citizen, so it spans its Xe architecture from humble integrated GPUs to supercomputer-class designs using four separate Xe microarchitectures. Secondly, the company also does not ignore the market of inexpensive low-power GPUs for notebooks as well as higher-end graphics cards for gamers although in both cases it will have to compete against established players like AMD and Nvidia.
Thirdly, the company will use internal production facilities to make low-power iGPU and dGPUs as well as Xe-HP datacenter GPUs and will turn to foundries to build its Xe offerings for HPC and gaming, which shows how flexible Intel now wants to be. Adaptability in terms of microarchitectures and flexibility in terms of manufacturing are great departures from Intel’s failed one-size-fits-all Larrabee strategy from the past decade.
“The blurry gray lines between what marketing says out of one side of their mouth and what engineering says out of the other is a tightrope all tech companies walk,” said Peddie. “I personally think Intel has no choice, the demands of cache size, bus size and bandwidth, memory type, and sundry special purpose engines (e.g., codecs, security, memory management, etc.) make it almost impossible to build a one-size-fits-all GPU.”
Intel’s Xe product stack for graphics and compute seems complete and the company’s intention to use contract makers of semiconductors to build these products demonstrates that Intel is serious about getting into the business. What remains to be seen is how competitive Intel’s Xe processors will be against those offered by AMD and NVIDIA when they enter the market. Intel has plenty of financial and engineering resources, but its rivals have been on this market for decades and have proven to be formidable competitors for a number of times.
— Anton Shilov is a veteran technology writer who has covered many aspects of the electronics industry, including semiconductors, computers, displays, and consumer electronics.