IBM research ecosystem promises full-stack innovation for game-changing AI chips, but its goals are lofty...
As the AI hardware landscape starts to become more clearly defined, we are seeing three main paradigms. Some of the chip industry big hitters are adapting their existing compute architectures for AI accelerators (Intel, Nvidia). Then we have the big data center players (Amazon, Google) who are throwing money at the problem and developing their own accelerator architectures, but keeping them for their own use. And finally we have the startups: around 70 at last count, working on novel compute architectures for every AI niche from the data center to the IoT.
The running theme is the siloed approach; all the companies are battling it out as individuals. Can any single company, even one as large as Intel or Google, achieve the kind of phenomenal performance gains required by cutting-edge, rapidly developing AI algorithms?
Enter IBM, with an interdisciplinary approach to advancing AI hardware like nothing we’ve seen so far in this space. The company has set up an organization, the AI Hardware Center, based in IBM Research’s lab in Albany, New York, and is building an ecosystem of partners to work together on IBM’s goal.
2020 Mega-Merger #3 | An AI Ecosystem (Almost) from Scratch — Podcast #108, with IBM Research vice president Mukesh Khare, and Synopsys vice president Arun Venkatachar.
“We believe that bringing experts together will really improve the rate of innovation and rate of progress as compared to one company trying to do everything by themselves,” said Mukesh Khare, vice president of IBM Systems Research. “I think in the end, working together will mean more progress, benefiting all of us, and a solution that is much more broadly adopted.”
Big Blue’s aim is to improve AI compute performance 2.5x per year, with an ambitious overall goal of 1000x performance efficiency (FLOPS/W) improvement by 2029.
“[The targets] are based on a very good understanding of how we can improve technology all the way [across the stack],” Khare said. “The exciting thing about IBM and IBM Research is that we can bring expertise of the full stack together. In our AI Hardware Center, we have the entire value chain of semiconductor technology all the way from devices and materials, to chip design, to system architecture, algorithms and applications.”
There are 14 partners, commercial and academic, in IBM’s ecosystem so far. This includes technology companies such as Samsung, Applied Materials, Tokyo Electron, Synopsys and Red Hat.
“We have to look at the improvement in the entire stack, end to end, because we cannot only improve, let’s say, software and ignore hardware, or we cannot only improve chip design and not worry about devices and materials,” Khare said. “Every part of this value chain is a significant opportunity for innovation. If we at IBM Research can bring experts together for the entire stack, this could be really revolutionary.”
The idea, he explained, is to give materials and manufacturing equipment companies access to the design and application part of the stack, as well as provide the design layer with knowledge about what’s coming in the pipeline in terms of new devices and materials. Companies in every part of the stack should be able to learn from each other in order to make the best possible AI hardware and software solution, so the whole community can benefit.
IBM’s key partner in the EDA space is Synopsys. The two companies started working together on this project about two years ago.
“The reason why this is very interesting for Synopsys is because it touches multiple aspects of the hardware ecosystem, because IBM doesn’t just build the core architecture, they have the entire journey going all the way from materials research all the way down to software,” said Arun Venkatachar, vice president of AI and central engineering at Synopsys. “What excites us with the IBM [ecosystem] is the collaboration at all these different levels.”
Synopsys has been providing EDA tools, including some custom tools, with which IBM has already taped out several chips. Venkatachar said that Synopsys was eager to work with IBM to solve some underlying issues, especially in the area of device physics.
“Being a research organization gives IBM Research the ability to explore in a much faster and more agile way compared to other companies,” Venkatachar said. “IBM Research was able to turn around architectural changes within six months… we learned a quite a bit from that journey.”
Learning is a big part of Synopsys’ motivation for joining the ecosystem, since AI chips present many unique challenges for EDA tools.
“Looking at the entire partnership as a whole gives a good visibility into how an AI chip comes together – it takes a village to build something like that,” Venkatachar said. “The AI architecture has some unique aspects compared to a CPU design or a GPU design, especially when it comes to: How do you do exploration of the architecture? How do you go about building memory close to your compute? And how do you do interconnects? How do you do data path validation? How do you fit such massive designs onto a chip?”
Venkatachar laid out four key challenges that AI chips present. Firstly, the designs are on a scale that hasn’t been seen before. Synopsys addresses scalability challenges for huge designs with its ZeBu hardware emulation technology.
The second challenge is power — considering the amount of compute needed, all AI applications are power-sensitive. How early in the design process power analysis can be explored is going to be key, Venkatachar said, for both cloud and edge designs.
Third is architectural exploration. Synopsys is developing a tool for early exploration of architectures which can model different architectural elements and consider the different topographies of neural networks.
And finally, there is the challenge of building totally new tools for aspects like data path validation which are very important in AI hardware. Scaling, optimisation and place and route on a complicated AI architecture are a big challenge, addressed by new innovations within Synopsys’ Fusion compiler. And Synopsys’ tools are also innovating to handle the device physics space, to simulate different materials so that designers can know whether their design will work without having to build it.
So innovation across the whole stack is important, but does it really need to happen all at the same time?
For example, Red Hat has recently joined the ecosystem. IBM is working on building compatibility between its first-generation AI cores and Red Hat OpenShift, the popular enterprise Kubernetes platform for deployment in the hybrid cloud.
“All of our hardware innovation that we are bringing in is going to be enabled in the Red Hat ecosystem in parallel, rather than in serial,” said IBM’s Mukesh Khare. “So we develop both hardware and software together and learn from each other and make each other better.”
IBM has also released an open-source software toolkit for people to work with the analog AI chip being developed, even though the hardware is still in the test chip phase. The toolkit includes access to a device simulator with PyTorch integration. The toolkit means AI practitioners can evaluate IBM’s technology and will hopefully enable the community to develop models that extract the full potential of the hardware, even while the hardware is still being built.
With partners from across the industry, all with different interests, how does a partnership like this actually work?
“Because there are competitors sitting together around the table in this partnership, we want to make sure we respect and protect intellectual properties,” Khare said. “So for certain areas where certain IPs are very sensitive we develop a bilateral partnership so that only very specific intellectual properties are only shared between two companies, like IBM and a partner. And then depending on our bilateral agreement, some of it can be shared broadly, either IP or the results.”
IBM acts as the host for the ecosystem, bringing partners together to share progress and challenges. Khare points out that IBM has successfully been hosting partnerships for more than 25 years.
“So we create a reasonably open platform, but then while respecting individual companies’ differentiation… so that at the end, everyone gets something differentiated out of this work for their own business,” he said.
Developing a technology for broad industry adoption surely requires a certain level of openness in the ecosystem. How open is the ecosystem IBM is creating?
Khare pointed to IBM’s partnership with Red Hat, who have a history of developing and championing open-source community-based programs, and IBM’s open-source analog AI toolkit, as evidence the company is trying to create a community of developers around the ecosystem’s work.
The AI Hardware Center would also welcome new partners, including other chip makers, he said.
“We always welcome partners to join in the journey, but obviously, everyone who joins has to also contribute,” said Khare. “That will be the expectation as we build a community; the members who participate bring some assets as well, to grow the overall knowledge and the progress of the roadmap.”
The AI Hardware Center is focusing on four key areas of technology in order to achieve its goal of 1000x performance efficiency (FLOPS/W) improvement by 2029.
Digital cores: IBM is using existing semiconductor technology to build AI-specific chips using digital electronics with reduced precision. Test chips have been taped out and manufactured. Commercial chips are due to hit the market in two years.
Analog cores: The research partnership is working on a new analog compute technology in a crosspoint array, similar to other analog compute-in-memory approaches (see: Mythic) that use an array of memory cells configured to achieve matrix multiplication quickly and with minimal power consumption. However, IBM is using phase-change memory. The current test chip (named “Fusion”) allows one phase change memory cell to be accessed at a time; future versions will allow full-scale matrix multiplication in a single time step.
“It’s a revolutionary idea, and this is an area which requires truly materials and device innovation, because it’s a new device that we are developing for analog AI,” said Khare. “In this journey, we have built test chips which… leverage frontend wafers from Samsung, and we built a backend on these synaptic device elements in our fab in Albany.”
Heterogeneous integration: Another area of research is packaging technology to connect multiple die in the same package, which applies to both the digital and analog technologies being developed. Either could be combined with memory die, for example.
“We are investing [heterogeneous integration technology] very heavily,” said Khare. “This enables us to reduce our dependency on monolithic integration – we can pick and choose [die], and optimise based on the workload, whether it’s for edge, data center or automotive applications. Different requirements could be met by leveraging many different kinds of chip and putting them together in a packaged form factor.”
This will offer greater flexibility than technologies such as high bandwidth memory (HBM), currently very popular with large-scale AI accelerators. HBM is for DRAM, but IBM’s technology will allow a memory technology to be chosen based on the application and workload – SRAM could be chosen over DRAM, say, if the speed requirement was different, or non-volatile memory could be used.
AI technology testbed: The AI Hardware Center will host R&D, prototyping, testing and simulation for new AI cores. This includes a testbed for image, speech and text processing demos which will test the accuracy and efficiency of new technologies.
The results of this collaboration are still a little way from commercial adoption, but IBM’s digital AI chips should hit the market in around two years, while the analog chip is on track for the four-year time frame, Khare said.