In light of Apple’s M1 processor announcement recent;y, let me ponder why Apple passed on the chiplets, and where I believe chiplets make more sense.
Recent angry comments to EE Times‘ interview with Intel’s Ramune Nagisetty disparaged the current heterogenous integration and chiplet discussions as more of a rehash than an innovation and furthermore simply a way for American manufacturers to obfuscate their inability to stay at the leading edge of wafer fabrication. Although a great deal of the package-level integration that has been discussed is not ground-breaking innovation, there is little doubt that we are in the middle of a significant shift in integration away from system-on-chip (SoC) design.
Espousing the virtues of the SoC approach after that introduction is odd timing, but the latest foray into chip design at Apple was just announced. The M1, based around a custom piece of Apple SoC silicon, will power the new Macbook Air as well as some Macbook Pro and Mac Mini models. If we can take the promotional images provided by Apple at face value, calling the M1 processor die an SoC is certainly no understatement. Taking all due blame for my focus on the chiplet approach to integration, there is none of that here. Looking at the M1 die photo from Apple, breaking this type of design up into chiplets would not be an attractive prospect. The additional interconnection and communication overhead would create more headaches than it’s worth.
The other argument in favor of the SoC approach for Apple is that they are mostly past the point of using anyone else’s physical layout IP cores. They are responsible for the bulk of all the circuit blocks and focusing their efforts on keeping tight control over physical design and the hardware-software integration to optimize the system and (presumably) improve the user experience. They wouldn’t be buying either a vendor designed piece of silicon or hard IP core to be stitched into their processor design.
Before comparing the pure SoC design to current layouts that might be more amenable to a chiplet approach, there are a couple more things to mention about the M1.
The idea of Apple silicon taking over the computer platform just as it did the iPhone and iPad sockets has been around for a while. One approach mentioned was that the ARM-based processors for mobile might move into the traditional computer space just by bringing the iOS software along with the A-series chip. As Apple seemed to be concentrating most of its resources on the iPhone and iPad to the detriment of the computer division, that seemed plausible. But we now see that Apple is designing for their full OS X system.
Looking again at the general layout of the M1 silicon, the design is very reminiscent of the A-series processors and mobile application processors generally. It may take some time to find out just what overlap there is, and I am sure that many floorplan analysts will begin to draw parallels between Apple’s A14 and their M1.
Late generation computers – especially Apple laptops – are often maligned for designing out any simple upgrade path for RAM by using soldered down BGA packages rather than plug in modules. The M1 brings this concept closer to the processor by using DRAM co-located onto a common package substrate with the M1 die. But hold on. These are not chiplets. The Apple photos depict them as some sort of packaged commodity DRAM, the same as would be on a module or motherboard.
Apple doesn’t design DRAM (yet), but it does happily rebrand commercial off-the-shelf components:
M1 also features our unified memory architecture, or UMA. M1 unifies its high‑bandwidth, low‑latency memory into a single pool within a custom package. As a result, all of the technologies in the SoC can access the same data without copying it between multiple pools of memory. This dramatically improves performance and power efficiency. Video apps are snappier. Games are richer and more detailed. Image processing is lightning fast. And your entire system is more responsive.
Apple claims the M1 Macbook Air will be 3.5 times as fast as the latest Intel-powered version. Part of the performance increase could be the type of DRAM they are using for this version. Unfortunately, “unified memory architecture” doesn’t provide any clues. Back to the silicon, combining the GPU onto the die could well be another performance boost. More will be known once mortals get their hands on the newest Apple computers.
Turning back to breaking up chips and optimizing systems through package integration, it is interesting to compare the die layout of the SoC M1 to a more traditional multi-core processor. For example, Intel has been producing a range of microprocessors to address markets within each of the laptop, desktop, and server markets. Just looking at laptops as an example, there is a need to address customers looking for the cheapest possible machine for web browsing versus a gamer wanting to have the best performance. Intel needs to produce chips with only a few to many cores. This can be done with different designs or by disabling cores to address the lower cost applications. Neither option seems very attractive.
Flexibility to optimize for a narrower range of applications is offered by splitting cores (and other functionality) out as chiplets that can be integrated as needed on the packaging platform. That is not to say that there are not additional new costs with the chiplet approach and some technical challenges still to be overcome, but the potential is certainly there. Intel has been actively promoting the idea through two packaging techniques they refer to as Foveros and embedded multi-die interconnect bridge (EMIB) both of which have devices in production.
AMD are also very active in this area, and their new Zen 3 architecture designs highlight the scalability of the chiplet approach. AMD splits the design into compute core die (CCD) and a separate die for I/O function – the less creatively labeled IOD for IO die. Naming conventions aside, the new Ryzen 5000 chips are a good representation of the chiplet approach to integration as well as processor core scaling. First off, the Ryzen designs add more cores simply by adding a second CCD to double the number of cores for specific needs. Second, the Ryzen is an example of the heterogeneous angle to the integration. While the processor die are manufactured on TSMC’s 7nm technology platform, the IOD chiplet is manufactured at GlobalFoundries on their 12nm process. That’s the promise of the paradigm.
Now it’s time to speculate on a bifurcated future for processor development. The captive market inside Apple for silicon designs is very specific. At most, Apple has two distinct product types – laptops and desktops. With the iMac line, I expect the same design to be re-used there. After all, the M1 based MacBook Pro is using the M1 the same as the air with the addition of fans that allow the same processor to be pushed a little harder. The Mac Pro line would not likely get its own design. That product will likely continue to use Intel processors as long as Apple keeps supporting that end of the market.
What about mobile phone application processors? Apple’s A-series along with Qualcomm’s Snapdragon and Samsung’s Exynos and others will continue to design full SoC die for that market.
Intel and AMD are a different story. They are both designing for the entire spectrum of computers beyond Apple and need to cut a wide swath of possible users. More design flexibility increases the potential to capture more of the market. Covering the bases with too many products or the same die by selectively activating more cores for higher end applications is a difficult proposition in the long run. The chiplet approach makes sense here, and we already see AMD and Intel both headed in that direction.
Going back in time, Xilinx produced an early implementation of what we now call heterogeneous integration with their 2.5D integration approach utilizing a silicon interposer as a platform for multiple FPGA die slices on the Virtex 7. At the time, we were talking about this as a first step toward 3D integration. Apresentation from Ivo Bolsens, the CTO of Xilinx, suggested that the 2.5D approach could be a long-term alternative to 3D. Almost a decade later, it appears so.
Advanced modern FPGA products are another place to look for more chiplet opportunities. A quick look through the Xilinx catalog reveals a heavy on-die SoC integration approach today with ARM cores, PCI express options, and transceivers along with the programmable arrays. For example, the Xilinx Zynq product line includes the Ultrascale+ RFSoC with quad-core ARM Cortex A53, dual core Cortex-R5F, PCI Express, DisplayPort, USB 3.0, along with on-chip SRAM and a host of controllers for external memory.
Looking backward again, Intel also looked at a chiplet, or multi-chip module (MCM) back in the day, approach to integrating FPGA die with their processors. At the at 2010 Intel Developer Forum, the Atom Processor E600 series was announced. The E600 was a configurable Atom processor on common BGA substrate with an Altera FPGA.
With all the view to the past, let me take the opportunity recognize and thank all of our veterans and service men and women since we just observed Veteran’s Day (Rememberance Day here in Canada). And that let me ponder Apple’s use of the M-series designation for their computer platform processors. I don’t think the corporate culture of Apple would allow them to use “Garand” to code-name the M1, but it would be a unique approach.