Addressing AI’s ‘Hand-Me-Down Infrastructure’ Issue

Article By : Sally Ward-Foxton

A stealthy startup plans to tackle 'everything between the AI framework and the hardware'.

A stealthy Silicon Valley startup is aiming to tackle the AI software problem, once and for all.

“We as an industry are at this interesting point where everybody knows [AI’s] potential,” Modular AI co-founder and CEO Chris Lattner said in an exclusive interview with EE Times. “Everybody’s seen the research but it’s not really getting into the products, except by the biggest companies in the world. It shouldn’t be that way.”

With AI and machine learning (ML) still nascent fields, some of the technologies today’s AI/ML software stacks depend on originated as research projects.

“It was pure research, and so as a consequence it made sense for a research lab to build these kinds of tools,” Lattner said, referring to today’s widely used AI frameworks and compiler infrastructure. “Fast forward to today, it’s not research anymore.”

This, he added, is one of the key reasons AI software and tools are not perfectly reliable, not predictable and have little provision for security. They simply weren’t built to be production software.

Lattner points out that today’s AI frameworks—such as Google’s TensorFlowMeta’s Pytorch or Google’s Jax—“are not there to make ML awesome for the entire world; they’re there to solve the problems of the company who pays for them,” and that if a company doesn’t have the same setup and use cases as the hyperscaler, then “it can work, but it’s not designed to work.”

Lattner refers to this as the “hand-me-down infrastructure” problem. Modular co-founder and chief product officer Tim Davis calls it “trickle-down infrastructure.”

Chris Lattner and Tim Davis
Modular co-founders Chris Lattner (left) and Tim Davis (right) (Source: Modular AI)

The problem for chip companies is that changes at the framework layer have repercussions.

“[Hardware companies] have to meet that programming model, to lower it onto their hardware,” Davis said. “As those frameworks evolve, the stack has to keep evolving to meet the needs [of the hardware], to fully saturate the hardware and utilize it. That means they have to keep going back to the framework level, to be able to support all the different frameworks. Turns out, that’s very challenging.”

Over the last few years, chip companies have brought out dozens of different accelerators based on domain-specific architectures. Each one requires a bespoke compiler, which, in most cases, has to be built from the ground up.

“The cool thing about tensors and machine learning graphs is that they have parallelism implicitly as part of the compute description, Lattner said (tensors are a data type commonly used in AI). “This means suddenly you’re at a higher level of abstraction, which means that compilers can do so much more. There’s two sides of that coin: One is they can do so much more, but the other is they have to do so much more.”

The state of AI software is also bad news for developers since the same program may need to be deployed on multiple systems with vastly different system constraints – everything from a server to a mobile phone to a web browser.

“If every single system you want to deploy to has a different toolchain, then a team building a product has to rewrite their code over and over again,” Lattner said. “This is a huge challenge. Right now, a hardware team will have to build their own stack because there is nothing that they can plug into…. We need more of a standardizing force, which can make it easier for the hardware folks but also help the software developer’s problem—because the tools can be good.”

Lattner and Davis’s startup, Modular, is intending to take on some of these problems.

“We’re tackling all the familiar problems of how do you do hardware abstraction, how do you have compilers talk to a wide range of different hardware, and how do you build the points that you can plug into with a lot of different hardware?” Lattner said. “Roughly what we’re building is a production quality version of all the tools and technology that the world’s already using.”

Modular plans to tackle everything between the framework and the hardware, including some common problems hardware companies face, while allowing them to build the parts of their stack that are specific to their accelerators themselves.

“We’re unlikely to be able to solve their unique problems,” he said. “But they also have common problems. Like, how do you load data? How do you plug into Pytorch? We can offer value on that side of the problem.”

This would also include tasks like image decoding and feature table embedding lookups–in other words, things that are unrelated to AI acceleration but are nonetheless expected by customers.

“There’s a whole lot of really interesting hardware out there that really struggles to get adopted because they’re just trying to get the basics running,” Lattner said.

Davis added that hardware companies struggle with changing demands from the framework, combined with constantly evolving algorithms.

“How can [evolving algorithms] be lowered to hardware without hardware companies basically having to rewrite half their AI software stack just to make that work?” he said. “This is a very concrete problem and we think there’s a significant opportunity there.”

Why does it take a brand new company to address these issues?

Lattner and Davis’ view is that most of the industry’s compiler engineers are working on making a given piece of hardware work, with tight constraints on timescales. This means no-one can look at the wider problem.

“It’s almost like a fragmentation problem,” Lattner said. “[Compiler engineering] talent gets distributed across all the different chips: There’s no center of gravity in which you can have a team that is enabled to care about building stuff that’s not just solving the problem but is also high quality.”

Modular is building such a team, beginning with Lattner, a co-inventor of LLVM. His CV also includes Clang, MLIR and Swift via SiFive, Google, Apple and Tesla.

Davis previously worked on Google’s AI infrastructure, including TFLite and Android ML. Modular’s compiler engineering lead Tatiana Shpeisman previously led CPU and GPU compiler infrastructure for Google ML and is also a co-founder of MLIR.

Other team members have backgrounds in XLA, TensorFlow, Pytorch and ONNX. All in all, Modular employs about 30 people.

Modular’s goal is a developer platform where different slices of the company’s technology can be used in different products in different ways.

“What we’re trying to do is fundamentally help ML grow up, help the infrastructure be something everybody can depend on, and allow people to build products on top of it instead of having to worry about all these things,” Lattner said. “There are really hard problems that can be solved by using the tech—and people want to work on that problem, not work on babysitting all the different things they want to take for granted.”

Modular is still in stealth mode, but plans to release its first products next year.

Subscribe to Newsletter

Leave a comment